spark_auto_mapper.data_types.regex_extract

Module Contents

Classes

AutoMapperRegExExtractDataType

Extracts a regex from a column. Will return an empty string if no match is found.

class spark_auto_mapper.data_types.regex_extract.AutoMapperRegExExtractDataType(column, pattern, index)

Bases: spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase

Extracts a regex from a column. Will return an empty string if no match is found.

Note that regexp_extract requires that the pattern match the entire string - it does the equivalent of a python re.match, not re.search

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) –

  • pattern (str) –

  • index (int) –

get_column_spec(self, source_df, current_column)

Gets the column spec for this automapper data type

Parameters
  • source_df (Optional[pyspark.sql.DataFrame]) – source data frame in case the automapper type needs that data to decide what to do

  • current_column (Optional[pyspark.sql.Column]) – (Optional) this is set when we are inside an array

Return type

pyspark.sql.Column