`spark_auto_mapper.automappers.automapper`¶

Module Contents¶

Classes¶

AutoMapper

Main AutoMapper Class

class spark_auto_mapper.automappers.automapper.AutoMapper(keys=None, view=None, source_view=None, keep_duplicates=False, drop_key_columns=True, checkpoint_after_columns=None, checkpoint_path=None, reuse_existing_view=False, use_schema=True, include_extension=False, include_null_properties=False, use_single_select=True, verify_row_count=True, skip_schema_validation=['extension'], skip_if_columns_null_or_empty=None, keep_null_rows=False, filter_by=None, logger=None, check_schema_for_all_columns=False, copy_all_unmapped_properties=False, copy_all_unmapped_properties_exclude=None, log_level=None)¶

Bases: spark_auto_mapper.automappers.container.AutoMapperContainer

Main AutoMapper Class

Creates an AutoMapper

Parameters

keys (Optional[List[str]]) – joining keys
view (Optional[str]) – view to return
source_view (Optional[str]) – where to load the data from
keep_duplicates (bool) – whether to leave duplicates at the end
drop_key_columns (bool) – whether to drop the key columns at the end
checkpoint_after_columns (Optional[int]) – checkpoint after how many columns have been processed
checkpoint_path (Optional[Union[str, pathlib.Path]]) – Path where to store the checkpoints
reuse_existing_view (bool) – If view already exists, whether to reuse it or create a new one
use_schema (bool) – apply schema to columns
include_extension (bool) – By default we don’t include extension elements since they take up a lot of schema. If you’re using extensions then set this
include_null_properties (bool) – If you want to include null properties
use_single_select (bool) – This is a faster way to run the AutoMapper since it will select all the columns at once. However this makes it harder to debug since you don’t know what column failed
verify_row_count (bool) – verifies that the count of rows remains the same before and after the transformation
skip_schema_validation (List[str]) – skip schema checks on these columns
skip_if_columns_null_or_empty (Optional[List[str]]) – skip creating the record if any of these columns are null or empty
keep_null_rows (bool) – whether to keep the null rows instead of removing them
filter_by (Optional[str]) – (Optional) SQL expression that is used to filter
copy_all_unmapped_properties (bool) – copy any property that is not explicitly mapped
copy_all_unmapped_properties_exclude (Optional[List[str]]) – exclude these columns when copy_all_unmapped_properties is set
logger (Optional[logging.Logger]) – logger used to log informational messages
check_schema_for_all_columns (bool) –
log_level (Optional[Union[int, str]]) –

transform_with_data_frame(self, df, source_df, keys)¶

Internal function called by base class to transform the data frame

Parameters

df (pyspark.sql.DataFrame) – destination data frame
source_df (Optional[pyspark.sql.DataFrame]) – source data frame
keys (List[str]) – key columns

Return type

pyspark.sql.DataFrame

:return data frame after the transform

transform(self, df)¶

Uses this AutoMapper to transform the specified data frame and return the new data frame

Parameters: df (pyspark.sql.DataFrame) – source data frame
Return type: pyspark.sql.DataFrame

:returns destination data frame

columns(self, **kwargs)¶

Adds mappings for columns

Example

mapper = AutoMapper(: view=”members”, source_view=”patients”, keys=[“member_id”], drop_key_columns=False,
).columns(: dst1=”src1”, dst2=AutoMapperList([“address1”]), dst3=AutoMapperList([“address1”, “address2”]), dst4=AutoMapperList([A.complex(use=”usual”, family=A.column(“last_name”))]),

)

Parameters

kwargs (spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType) – A dictionary of mappings

Returns

The same AutoMapper

Return type

AutoMapper

complex(self, entity)¶

Adds mappings for an entity

Example

mapper = AutoMapper(

view=”members”, source_view=”patients”, keys=[“member_id”], drop_key_columns=False,

).complex(

MyClass(: name=A.column(“last_name”), age=A.number(A.column(“my_age”))

)

Parameters

entity (spark_auto_mapper.data_types.complex.complex_base.AutoMapperDataTypeComplexBase) – An AutoMapper type

Returns

The same AutoMapper

Return type

AutoMapper

__repr__(self)¶

Display for debugger

Returns: string representation for debugger
Return type: str

to_debug_string(self, source_df=None)¶

Displays the automapper as a string

Parameters: source_df (Optional[pyspark.sql.DataFrame]) – (Optional) source data frame
Returns: string representation
Return type: str

property column_specs(self)¶

Useful to show in debugger

:return dictionary of column specs

Return type: Dict[str, pyspark.sql.Column]

__str__(self)¶

Return str(self).

Return type: str

spark_auto_mapper.automappers.automapper¶

Module Contents¶

Classes¶

`spark_auto_mapper.automappers.automapper`¶