spark_auto_mapper.helpers.automapper_helpers

Module Contents

Classes

AutoMapperHelpers

class spark_auto_mapper.helpers.automapper_helpers.AutoMapperHelpers
static struct(value)

Creates a struct

Parameters

value (Dict[str, Any]) – A dictionary to be converted to a struct

Returns

A struct automapper type

Return type

spark_auto_mapper.data_types.complex.struct_type.AutoMapperDataTypeStruct

static complex(**kwargs)

Creates a complex type.

Parameters

kwargs (spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType) – parameters to be used to create the complex type

Returns

A complex automapper type

Return type

spark_auto_mapper.data_types.complex.complex.AutoMapperDataTypeComplex

static column(value)

Specifies that the value parameter should be used as a column name

Parameters

value (str) – name of column

Returns

A column automapper type

Return type

spark_auto_mapper.data_types.array_base.AutoMapperArrayLikeBase

static text(value)

Specifies that the value parameter should be used as a literal text

Parameters

value (Union[spark_auto_mapper.type_definitions.native_types.AutoMapperNativeSimpleType, spark_auto_mapper.type_definitions.defined_types.AutoMapperTextInputType]) – text value

Returns

a text automapper type

Return type

spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase

static expression(value)

Specifies that the value parameter should be executed as a sql expression in Spark

Parameters

value (str) – sql

Returns

an expression automapper type

Example

A.expression( ” CASE

WHEN Member Sex = ‘F’ THEN ‘female’ WHEN Member Sex = ‘M’ THEN ‘male’ ELSE ‘other’

END ” )

Return type

spark_auto_mapper.data_types.array_base.AutoMapperArrayLikeBase

static date(value, formats=None)

Converts a value to date only For datetime use the datetime mapper type

Parameters
  • value (spark_auto_mapper.type_definitions.defined_types.AutoMapperDateInputType) – value

  • formats (Optional[List[str]]) – (Optional) formats to use for trying to parse the value otherwise uses: y-M-d yyyyMMdd M/d/y

Return type

spark_auto_mapper.data_types.date.AutoMapperDateDataType

static datetime(value, formats=None)

Converts the value to a timestamp type in Spark

Parameters
  • value (spark_auto_mapper.type_definitions.defined_types.AutoMapperDateInputType) – value

  • formats (Optional[List[str]]) – (Optional) formats to use for trying to parse the value otherwise uses Spark defaults

Return type

spark_auto_mapper.data_types.datetime.AutoMapperDateTimeDataType

static decimal(value, precision, scale)

Specifies the value should be used as a decimal

Parameters
  • value (spark_auto_mapper.type_definitions.defined_types.AutoMapperAmountInputType) –

  • precision (int) – the maximum total number of digits (on both sides of dot)

  • scale (int) – the number of digits on right side of dot

Returns

a decimal automapper type

Return type

spark_auto_mapper.data_types.decimal.AutoMapperDecimalDataType

static amount(value)

Specifies the value should be used as an amount

Parameters

value (spark_auto_mapper.type_definitions.defined_types.AutoMapperAmountInputType) –

Returns

an amount automapper type

Return type

spark_auto_mapper.data_types.amount.AutoMapperAmountDataType

static boolean(value)

Specifies the value should be used as a boolean

Parameters

value (spark_auto_mapper.type_definitions.defined_types.AutoMapperBooleanInputType) –

Returns

a boolean automapper type

Return type

spark_auto_mapper.data_types.boolean.AutoMapperBooleanDataType

static number(value)

Specifies value should be used as a number

Parameters

value (spark_auto_mapper.type_definitions.defined_types.AutoMapperNumberInputType) –

Returns

a number automapper type

Return type

spark_auto_mapper.data_types.number.AutoMapperNumberDataType

static concat(*args)

concatenates a list of values. Each value can be a string or a column

Parameters

args (Union[spark_auto_mapper.type_definitions.native_types.AutoMapperNativeTextType, spark_auto_mapper.type_definitions.wrapper_types.AutoMapperWrapperType, spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase, spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase]) – string or column

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.concat.AutoMapperConcatDataType

static if_(column, check, value, else_=None)

Checks if column matches check_value. Returns value if it matches else else_

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check

  • check (Union[spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType, List[spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType]]) – value to compare the column to

  • value (_TAutoMapperDataType) – what to return if the value matches

  • else – what value to assign if check fails

  • else_ (Optional[_TAutoMapperDataType]) –

Returns

an if automapper type

Return type

_TAutoMapperDataType

static if_not(column, check, value)

Checks if column matches check_value. Returns value if it does not match

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check

  • check (Union[spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType, List[spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType]]) – value to compare the column to

  • value (_TAutoMapperDataType) – what to return if the value matches

Returns

an if automapper type

Return type

_TAutoMapperDataType

static if_not_null(check, value, when_null=None)

Checks if check is null

Parameters
  • check (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check for null

  • value (_TAutoMapperDataType) – what to return if the value is not null

  • when_null (Optional[_TAutoMapperDataType]) – what value to assign if check is not

Returns

an if_not_null automapper type

Return type

_TAutoMapperDataType

static if_not_null_or_empty(check, value, when_null_or_empty=None)

Checks if check is null or empty.

Parameters
  • check (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check for null

  • value (_TAutoMapperDataType) – what to return if the value is not null

  • when_null_or_empty (Optional[_TAutoMapperDataType]) – what value to assign if check is not

Returns

an if_not_null automapper type

Return type

_TAutoMapperDataType

static map(column, mapping, default=None)

maps the contents of a column to values

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column

  • mapping (Dict[Optional[spark_auto_mapper.type_definitions.defined_types.AutoMapperTextInputType], spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType]) – A dictionary mapping the contents of the column to other values e.g., {“Y”:”Yes”, “N”: “No”}

  • default (Optional[spark_auto_mapper.type_definitions.defined_types.AutoMapperAnyDataType]) – the value to assign if no value matches

Returns

a map automapper type

Return type

spark_auto_mapper.data_types.expression.AutoMapperDataTypeExpression

static left(column, length)

Take the specified number of first characters in a string

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • length (int) – number of characters to take from left

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.substring.AutoMapperSubstringDataType

static right(column, length)

Take the specified number of last characters in a string

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • length (int) – number of characters to take from right

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.substring.AutoMapperSubstringDataType

static substring(column, start, length)

Finds a substring in the specified string.

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • start (int) – position to start

  • length (int) – number of characters to take

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.substring.AutoMapperSubstringDataType

static string_before_delimiter(column, delimiter)

Take the specified number of first characters in a string

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • delimiter (str) – string to use as delimiter

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.substring_by_delimiter.AutoMapperSubstringByDelimiterDataType

static string_after_delimiter(column, delimiter)

Take the specified number of first characters in a string

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • delimiter (str) – string to use as delimiter

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.substring_by_delimiter.AutoMapperSubstringByDelimiterDataType

static substring_by_delimiter(column, delimiter, delimiter_count)

Returns the substring from string str before count occurrences of the delimiter. substring_by_delimiter performs a case-sensitive match when searching for delimiter.

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • delimiter (str) – string to use as delimiter. can be a regex.

  • delimiter_count (int) –

    If delimiter_count is positive, everything the left of the final delimiter

    (counting from left) is returned.

    If delimiter_count is negative, every to the right of the final delimiter

    (counting from the right) is returned.

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.substring_by_delimiter.AutoMapperSubstringByDelimiterDataType

static regex_replace(column, pattern, replacement)

Replace all substrings of the specified string value that match regexp with rep.

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to replace

  • pattern (str) – pattern to search for

  • replacement (str) – string to replace with

Returns

a regex_replace automapper type

Return type

spark_auto_mapper.data_types.regex_replace.AutoMapperRegExReplaceDataType

static regex_extract(column, pattern, index)

Extracts a specific group matched by a regex from a specified column. If there was no match or the requested group does not exist, an empty string is returned.

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to replace

  • pattern (str) – pattern containing groups to match

  • index (int) – index of the group to return (1-indexed, use 0 to return the whole matched string)

Returns

a regex_extract automapper type

Return type

spark_auto_mapper.data_types.regex_extract.AutoMapperRegExExtractDataType

static trim(column)

Trim the spaces from both ends for the specified string column.

Parameters

column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to trim

Returns

a trim automapper type

Return type

spark_auto_mapper.data_types.trim.AutoMapperTrimDataType

static lpad(column, length, pad)

Returns column value, left-padded with pad to a length of length. If column value is longer than length, the return value is shortened to length characters.

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to left pad

  • length (int) – the desired length of the final string

  • pad (str) – the character to use to pad the string to the desired length

Return type

spark_auto_mapper.data_types.lpad.AutoMapperLPadDataType

static hash(*args)

Calculates the hash code of given columns, and returns the result as an int column.

Parameters

args (Union[spark_auto_mapper.type_definitions.native_types.AutoMapperNativeTextType, spark_auto_mapper.type_definitions.wrapper_types.AutoMapperWrapperType, spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase]) – string or column

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.hash.AutoMapperHashDataType

static coalesce(*args)

Returns the first value that is not null.

Returns

a coalesce automapper type

Parameters

args (_TAutoMapperDataType) –

Return type

_TAutoMapperDataType

static array_max(*args)

Returns the first value that is not null.

Returns

a coalesce automapper type

Parameters

args (_TAutoMapperDataType) –

Return type

_TAutoMapperDataType

static array_distinct(*args)

Returns the distinct items in the array.

Returns

a coalesce automapper type

Parameters

args (_TAutoMapperDataType) –

Return type

_TAutoMapperDataType

static if_regex(column, check, value, else_=None)

Checks if column matches check_value. Returns value if it matches else else_

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check

  • check (Union[str, List[str]]) – value to compare the column to. Has to be a string or list of strings

  • value (_TAutoMapperDataType) – what to return if the value matches

  • else – what value to assign if check fails

  • else_ (Optional[_TAutoMapperDataType]) –

Returns

an if automapper type

Return type

_TAutoMapperDataType

static filter(column, func)

Filters a column by a function

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check

  • func (Callable[[pyspark.sql.Column], pyspark.sql.Column]) – func to filter by

Returns

a filter automapper type

Return type

spark_auto_mapper.data_types.filter.AutoMapperFilterDataType

static transform(column, value)

transforms a column into another type or struct

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column to check

  • value (_TAutoMapperDataType) – func to create type or struct

Returns

a transform automapper type

Return type

List[_TAutoMapperDataType]

static field(value)

Specifies that the value parameter should be used as a field name

Parameters

value (str) – name of column

Returns

A column automapper type

Return type

spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase

static current()

Specifies to use the current item

Returns

A column automapper type

Return type

spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase

static split_by_delimiter(column, delimiter)

Split a string into an array using the delimiter

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • delimiter (str) – string to use as delimiter

Returns

a concat automapper type

Return type

spark_auto_mapper.data_types.split_by_delimiter.AutoMapperSplitByDelimiterDataType

static float(value)

Converts column to float

Returns

Return type

Parameters

value (spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase) –

static flatten(column)

creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. source: http://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/functions.html#flatten

Returns

a flatten automapper type

Parameters

column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) –

Return type

spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase

static first_valid_column(*columns)
Allows for columns to be defined based in which a source column may not exist. If the optional source column does

not exist, the “default” column definition is used instead.

return

a optional automapper type

Parameters

columns (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) –

Return type

spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase

static if_column_exists(column, if_exists, if_not_exists)

check the if the column exists if exists returns if_exists if not if_not_exists

Returns

a optional automapper type

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) –

  • if_exists (Optional[_TAutoMapperDataType]) –

  • if_not_exists (Optional[_TAutoMapperDataType]) –

Return type

spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase

static array(value)

creates an array from a single item. source: http://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/functions.html#array

Returns

an array automapper type

Parameters

value (spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase) –

Return type

spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase

static join_using_delimiter(column, delimiter)

Joins an array and forms a string using the delimiter

Parameters
  • column (spark_auto_mapper.type_definitions.wrapper_types.AutoMapperColumnOrColumnLikeType) – column whose contents to use

  • delimiter (str) – string to use as delimiter

Returns

a join automapper type

Return type

spark_auto_mapper.data_types.join_using_delimiter.AutoMapperJoinUsingDelimiterDataType

static unix_timestamp(value)

Joins an array and forms a string using the delimiter

Parameters

value (spark_auto_mapper.type_definitions.defined_types.AutoMapperNumberInputType) – value to convert to unix timestamp

Returns

a join automapper type

Return type

spark_auto_mapper.data_types.unix_timestamp.AutoMapperUnixTimestampType