spark_auto_mapper.automappers.with_column_base

Module Contents

Classes

AutoMapperWithColumnBase

Abstract Base class for AutoMappers

class spark_auto_mapper.automappers.with_column_base.AutoMapperWithColumnBase(dst_column, value, column_schema, include_null_properties, skip_if_columns_null_or_empty=None)

Bases: spark_auto_mapper.automappers.automapper_base.AutoMapperBase

Abstract Base class for AutoMappers

Parameters
  • dst_column (str) –

  • value (spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType) –

  • column_schema (Optional[pyspark.sql.types.StructField]) –

  • include_null_properties (bool) –

  • skip_if_columns_null_or_empty (Optional[List[str]]) –

get_column_spec(self, source_df)
Parameters

source_df (Optional[pyspark.sql.DataFrame]) –

Return type

pyspark.sql.Column

get_column_specs(self, source_df)

Gets column specs (Spark expressions)

Parameters

source_df (Optional[pyspark.sql.DataFrame]) – source data frame

Returns

dictionary of column name, column expression

Return type

Dict[str, pyspark.sql.Column]

transform_with_data_frame(self, df, source_df, keys)

Internal function called by base class to transform the data frame

Parameters
  • df (pyspark.sql.DataFrame) – destination data frame

  • source_df (Optional[pyspark.sql.DataFrame]) – source data frame

  • keys (List[str]) – key columns

Return type

pyspark.sql.DataFrame

:return data frame after the transform

check_schema(self, parent_column, source_df)

Checks the schema

Parameters
  • parent_column (Optional[str]) – parent column

  • source_df (Optional[pyspark.sql.DataFrame]) – source data frame

Returns

result of checking schema

Return type

Optional[spark_auto_mapper.automappers.check_schema_result.CheckSchemaResult]