pyarrow.compute.extract_regex_span#
- pyarrow.compute.extract_regex_span(strings, /, pattern, *, options=None, memory_pool=None)#
Extract string spans captured by a regex pattern.
For each string in strings, match the regular expression and, if successful, emit a struct with field names and values coming from the regular expression’s named capture groups. Each struct field value will be a fixed_size_list(offset_type, 2) where offset_type is int32 or int64, depending on the input string type. The two elements in each fixed-size list are the index and the length of the substring matched by the corresponding named capture group.
If the input is null or the regular expression fails matching, a null output value is emitted.
Regular expression matching is done using the Google RE2 library.
- Parameters:
- stringsArray-like or scalar-like
Argument to compute function.
- pattern
str Regular expression with named capture fields.
- options
pyarrow.compute.ExtractRegexSpanOptions, optional Alternative way of passing options.
- memory_pool
pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool.