spark_pipeline_framework.pipelines.v2.framework_pipeline

Module Contents

Classes

FrameworkPipeline

Abstract class for transformers that transform one dataset into another.

class spark_pipeline_framework.pipelines.v2.framework_pipeline.FrameworkPipeline(parameters: Dict[str, Any], progress_logger: spark_pipeline_framework.progress_logger.progress_logger.ProgressLogger, run_id: Optional[str], client_name: Optional[str] = None, vendor_name: Optional[str] = None, data_lake_path: Optional[str] = None, validation_output_path: Optional[str] = None)

Bases: pyspark.ml.base.Transformer

Abstract class for transformers that transform one dataset into another.

New in version 1.3.0.

property parameters(self) Dict[str, Any]
property run_id(self) str
fit(self, df: pyspark.sql.dataframe.DataFrame) FrameworkPipeline
_transform(self, df: pyspark.sql.dataframe.DataFrame) pyspark.sql.dataframe.DataFrame

Transforms the input dataset.

datasetpyspark.sql.DataFrame

input dataset.

pyspark.sql.DataFrame

transformed dataset

_check_validation(self, df: pyspark.sql.dataframe.DataFrame) None
create_steps(self, my_list: Union[List[pyspark.ml.base.Transformer], List[spark_pipeline_framework.transformers.framework_transformer.v1.framework_transformer.FrameworkTransformer], List[Union[pyspark.ml.base.Transformer, List[pyspark.ml.base.Transformer]]], List[Union[spark_pipeline_framework.transformers.framework_transformer.v1.framework_transformer.FrameworkTransformer, List[spark_pipeline_framework.transformers.framework_transformer.v1.framework_transformer.FrameworkTransformer]]]]) List[pyspark.ml.base.Transformer]
finalize(self) None