spark_auto_mapper.automappers.automapper
¶
Module Contents¶
Classes¶
Main AutoMapper Class |
- class spark_auto_mapper.automappers.automapper.AutoMapper(keys=None, view=None, source_view=None, keep_duplicates=False, drop_key_columns=True, checkpoint_after_columns=None, checkpoint_path=None, reuse_existing_view=False, use_schema=True, include_extension=False, include_null_properties=False, use_single_select=True, verify_row_count=True, skip_schema_validation=['extension'], skip_if_columns_null_or_empty=None, keep_null_rows=False, filter_by=None, logger=None, check_schema_for_all_columns=False, copy_all_unmapped_properties=False, copy_all_unmapped_properties_exclude=None, log_level=None)¶
Bases:
spark_auto_mapper.automappers.container.AutoMapperContainer
Main AutoMapper Class
Creates an AutoMapper
- Parameters
keys (Optional[List[str]]) – joining keys
view (Optional[str]) – view to return
source_view (Optional[str]) – where to load the data from
keep_duplicates (bool) – whether to leave duplicates at the end
drop_key_columns (bool) – whether to drop the key columns at the end
checkpoint_after_columns (Optional[int]) – checkpoint after how many columns have been processed
checkpoint_path (Optional[Union[str, pathlib.Path]]) – Path where to store the checkpoints
reuse_existing_view (bool) – If view already exists, whether to reuse it or create a new one
use_schema (bool) – apply schema to columns
include_extension (bool) – By default we don’t include extension elements since they take up a lot of schema. If you’re using extensions then set this
include_null_properties (bool) – If you want to include null properties
use_single_select (bool) – This is a faster way to run the AutoMapper since it will select all the columns at once. However this makes it harder to debug since you don’t know what column failed
verify_row_count (bool) – verifies that the count of rows remains the same before and after the transformation
skip_schema_validation (List[str]) – skip schema checks on these columns
skip_if_columns_null_or_empty (Optional[List[str]]) – skip creating the record if any of these columns are null or empty
keep_null_rows (bool) – whether to keep the null rows instead of removing them
filter_by (Optional[str]) – (Optional) SQL expression that is used to filter
copy_all_unmapped_properties (bool) – copy any property that is not explicitly mapped
copy_all_unmapped_properties_exclude (Optional[List[str]]) – exclude these columns when copy_all_unmapped_properties is set
logger (Optional[logging.Logger]) – logger used to log informational messages
check_schema_for_all_columns (bool) –
log_level (Optional[Union[int, str]]) –
- transform_with_data_frame(self, df, source_df, keys)¶
Internal function called by base class to transform the data frame
- Parameters
df (pyspark.sql.DataFrame) – destination data frame
source_df (Optional[pyspark.sql.DataFrame]) – source data frame
keys (List[str]) – key columns
- Return type
pyspark.sql.DataFrame
:return data frame after the transform
- transform(self, df)¶
Uses this AutoMapper to transform the specified data frame and return the new data frame
- Parameters
df (pyspark.sql.DataFrame) – source data frame
- Return type
pyspark.sql.DataFrame
:returns destination data frame
- columns(self, **kwargs)¶
Adds mappings for columns
- Example
- mapper = AutoMapper(
view=”members”, source_view=”patients”, keys=[“member_id”], drop_key_columns=False,
- ).columns(
dst1=”src1”, dst2=AutoMapperList([“address1”]), dst3=AutoMapperList([“address1”, “address2”]), dst4=AutoMapperList([A.complex(use=”usual”, family=A.column(“last_name”))]),
)
- Parameters
kwargs (spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType) – A dictionary of mappings
- Returns
The same AutoMapper
- Return type
- complex(self, entity)¶
Adds mappings for an entity
- Example
- mapper = AutoMapper(
view=”members”, source_view=”patients”, keys=[“member_id”], drop_key_columns=False,
- ).complex(
- MyClass(
name=A.column(“last_name”), age=A.number(A.column(“my_age”))
)
)
- Parameters
entity (spark_auto_mapper.data_types.complex.complex_base.AutoMapperDataTypeComplexBase) – An AutoMapper type
- Returns
The same AutoMapper
- Return type
- __repr__(self)¶
Display for debugger
- Returns
string representation for debugger
- Return type
str
- to_debug_string(self, source_df=None)¶
Displays the automapper as a string
- Parameters
source_df (Optional[pyspark.sql.DataFrame]) – (Optional) source data frame
- Returns
string representation
- Return type
str
- property column_specs(self)¶
Useful to show in debugger
:return dictionary of column specs
- Return type
Dict[str, pyspark.sql.Column]
- __str__(self)¶
Return str(self).
- Return type
str