spark_auto_mapper.data_types.first_valid_column

Module Contents

Classes

AutoMapperFirstValidColumnType

Accepts any number of column definitions and will return the first valid column definition, similar to how

class spark_auto_mapper.data_types.first_valid_column.AutoMapperFirstValidColumnType(*columns)

Bases: spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase, Generic[_TAutoMapperDataType]

Accepts any number of column definitions and will return the first valid column definition, similar to how coalesce works, but with the existence of columns rather than null values inside the columns.

Useful for data sources in which columns may be renamed at some point and you want to process files from before and after the name change or when columns are added at a point and are missing from earlier files.

Parameters

columns (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) –

get_column_spec(self, source_df, current_column)

Gets the column spec for this automapper data type

Parameters
  • source_df (Optional[pyspark.sql.DataFrame]) – source data frame in case the automapper type needs that data to decide what to do

  • current_column (Optional[pyspark.sql.Column]) – (Optional) this is set when we are inside an array

Return type

pyspark.sql.Column