pyarrow.compute.extract_regex_span#

pyarrow.compute.extract_regex_span(strings, /, pattern, *, options=None, memory_pool=None)#

Extract string spans captured by a regex pattern.

For each string in strings, match the regular expression and, if successful, emit a struct with field names and values coming from the regular expression’s named capture groups. Each struct field value will be a fixed_size_list(offset_type, 2) where offset_type is int32 or int64, depending on the input string type. The two elements in each fixed-size list are the index and the length of the substring matched by the corresponding named capture group.

If the input is null or the regular expression fails matching, a null output value is emitted.

Regular expression matching is done using the Google RE2 library.

Parameters:
stringsArray-like or scalar-like

Argument to compute function.

patternstr

Regular expression with named capture fields.

optionspyarrow.compute.ExtractRegexSpanOptions, optional

Alternative way of passing options.

memory_poolpyarrow.MemoryPool, optional

If not passed, will allocate memory from the default memory pool.