spark_pipeline_framework.utilities.spark_data_frame_helpers

Module Contents

Functions

convert_to_row(d: Dict[Any, Any]) → pyspark.Row

create_view_from_dictionary(view: str, data: List[Dict[str, Any]], spark_session: pyspark.sql.SparkSession, schema: Optional[pyspark.sql.types.StructType] = None) → pyspark.sql.DataFrame

parses the dictionary and converts it to a dataframe and creates a view

create_dataframe_from_dictionary(data: List[Dict[str, Any]], spark_session: pyspark.sql.SparkSession, schema: Optional[pyspark.sql.types.StructType] = None) → pyspark.sql.DataFrame

Creates data frame from dictionary

create_empty_dataframe(spark_session: pyspark.sql.SparkSession) → pyspark.sql.DataFrame

create_dataframe_from_json(spark_session: pyspark.sql.SparkSession, schema: pyspark.sql.types.StructType, json: str) → pyspark.sql.DataFrame

spark_is_data_frame_empty(df: pyspark.sql.DataFrame) → bool

Efficient way to check if the data frame is empty without getting the count of the whole data frame

spark_get_execution_plan(df: pyspark.sql.DataFrame, extended: bool = False) → Any

spark_table_exists(sql_ctx: pyspark.SQLContext, view: str) → bool

sc(df: pyspark.sql.DataFrame) → pyspark.SparkContext

add_metadata_to_column(df: pyspark.sql.DataFrame, column: str, metadata: Any) → pyspark.sql.DataFrame

get_metadata_of_column(df: pyspark.sql.DataFrame, column: str) → Any

to_dicts(df: pyspark.sql.DataFrame, limit: int) → List[Dict[str, Any]]

converts a data frame into a list of dictionaries

spark_pipeline_framework.utilities.spark_data_frame_helpers.convert_to_row(d: Dict[Any, Any]) pyspark.Row
spark_pipeline_framework.utilities.spark_data_frame_helpers.create_view_from_dictionary(view: str, data: List[Dict[str, Any]], spark_session: pyspark.sql.SparkSession, schema: Optional[pyspark.sql.types.StructType] = None) pyspark.sql.DataFrame

parses the dictionary and converts it to a dataframe and creates a view :param view: :param data: :param spark_session: :param schema: :return: new data frame

spark_pipeline_framework.utilities.spark_data_frame_helpers.create_dataframe_from_dictionary(data: List[Dict[str, Any]], spark_session: pyspark.sql.SparkSession, schema: Optional[pyspark.sql.types.StructType] = None) pyspark.sql.DataFrame

Creates data frame from dictionary :param data: :param spark_session: :param schema: :return: data frame

spark_pipeline_framework.utilities.spark_data_frame_helpers.create_empty_dataframe(spark_session: pyspark.sql.SparkSession) pyspark.sql.DataFrame
spark_pipeline_framework.utilities.spark_data_frame_helpers.create_dataframe_from_json(spark_session: pyspark.sql.SparkSession, schema: pyspark.sql.types.StructType, json: str) pyspark.sql.DataFrame
spark_pipeline_framework.utilities.spark_data_frame_helpers.spark_is_data_frame_empty(df: pyspark.sql.DataFrame) bool

Efficient way to check if the data frame is empty without getting the count of the whole data frame

spark_pipeline_framework.utilities.spark_data_frame_helpers.spark_get_execution_plan(df: pyspark.sql.DataFrame, extended: bool = False) Any
spark_pipeline_framework.utilities.spark_data_frame_helpers.spark_table_exists(sql_ctx: pyspark.SQLContext, view: str) bool
spark_pipeline_framework.utilities.spark_data_frame_helpers.sc(df: pyspark.sql.DataFrame) pyspark.SparkContext
spark_pipeline_framework.utilities.spark_data_frame_helpers.add_metadata_to_column(df: pyspark.sql.DataFrame, column: str, metadata: Any) pyspark.sql.DataFrame
spark_pipeline_framework.utilities.spark_data_frame_helpers.get_metadata_of_column(df: pyspark.sql.DataFrame, column: str) Any
spark_pipeline_framework.utilities.spark_data_frame_helpers.to_dicts(df: pyspark.sql.DataFrame, limit: int) List[Dict[str, Any]]

converts a data frame into a list of dictionaries :param df: :param limit: :return: