spark_auto_mapper.data_types.data_type_base

Module Contents

Classes

AutoMapperDataTypeBase

Base class for all Automapper data types

class spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase

Base class for all Automapper data types

abstract get_column_spec(self, source_df, current_column)

Gets the column spec for this automapper data type

Parameters
  • source_df (Optional[pyspark.sql.DataFrame]) – source data frame in case the automapper type needs that data to decide what to do

  • current_column (Optional[pyspark.sql.Column]) – (Optional) this is set when we are inside an array

Return type

pyspark.sql.Column

get_value(self, value, source_df, current_column)

Gets the value for this automapper

Parameters
  • value (AutoMapperDataTypeBase) – current value

  • source_df (Optional[pyspark.sql.DataFrame]) – source data frame in case the automapper type needs that data to decide what to do

  • current_column (Optional[pyspark.sql.Column]) – (Optional) this is set when we are inside an array

Return type

pyspark.sql.Column

include_null_properties(self, include_null_properties)
Parameters

include_null_properties (bool) –

Return type

None

transform(self, value)

transforms a column into another type or struct

Parameters
  • self (AutoMapperDataTypeBase) – Set by Python. No need to pass.

  • value (_TAutoMapperDataType) – Complex or Simple Type to create for each item in the array

Returns

a transform automapper type

Example

A.column(“last_name”).transform(A.complex(bar=A.field(“value”), bar2=A.field(“system”)))

Return type

List[_TAutoMapperDataType]

select(self, value)

transforms a column into another type or struct

Parameters
  • self – Set by Python. No need to pass.

  • value (_TAutoMapperDataType) – Complex or Simple Type to create for each item in the array

Returns

a transform automapper type

Example

A.column(“last_name”).select(A.complex(bar=A.field(“value”), bar2=A.field(“system”)))

Return type

_TAutoMapperDataType

filter(self, func)

filters an array column

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • func (Callable[[pyspark.sql.Column], pyspark.sql.Column]) – func to create type or struct

Returns

a filter automapper type

Example

A.column(“last_name”).filter(lambda x: x[“use”] == lit(“usual”)

Return type

_TAutoMapperDataType

)

split_by_delimiter(self, delimiter)

splits a text column by the delimiter to create an array

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • delimiter (str) – delimiter

Returns

a split_by_delimiter automapper type

Example

A.column(“last_name”).split_by_delimiter(“|”)

Return type

_TAutoMapperDataType

select_one(self, value)

selects first item from array

Parameters
  • self – Set by Python. No need to pass.

  • value (_TAutoMapperDataType) – Complex or Simple Type to create for each item in the array

Returns

a transform automapper type

Example

A.column(“identifier”).select_one(A.field(“_.value”))

Return type

_TAutoMapperDataType

first(self)

returns the first element in array

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

a filter automapper type

Example

A.column(“identifier”).select(A.field(“_.value”)).first()

Return type

_TAutoMapperDataType

expression(self, value)

Specifies that the value parameter should be executed as a sql expression in Spark

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • value (str) – sql to run

Returns

an expression automapper type

Example

A.column(“identifier”).expression( ” CASE

WHEN Member Sex = ‘F’ THEN ‘female’ WHEN Member Sex = ‘M’ THEN ‘male’ ELSE ‘other’

END ” )

Return type

_TAutoMapperDataType

current(self)

Specifies to use the current item

Parameters

self – Set by Python. No need to pass.

Returns

A column automapper type

Example

A.column(“last_name”).current()

Return type

_TAutoMapperDataType

field(self, value)

Specifies that the value parameter should be used as a field name

Parameters
  • self – Set by Python. No need to pass.

  • value (str) – name of field

Returns

A column automapper type

Example

A.column(“identifier”).select_one(A.field(“type.coding[0].code”))

Return type

_TAutoMapperDataType

flatten(self)

creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. source: http://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/functions.html#flatten

Parameters

self – Set by Python. No need to pass.

Returns

a flatten automapper type

Example

A.flatten(A.column(“column”))

Return type

AutoMapperDataTypeBase

to_array(self)

converts single element into an array

Parameters

self – Set by Python. No need to pass.

Returns

an automapper type

Example

A.column(“identifier”).to_array()

Return type

spark_auto_mapper.data_types.array_base.AutoMapperArrayLikeBase

concat(self, list2)

concatenates two arrays or strings

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • list2 (_TAutoMapperDataType) – list to concat into the current column

Returns

a filter automapper type

Example

A.column(“identifier”).concat(A.text(“foo”).to_array()))

Return type

_TAutoMapperDataType

to_float(self)

Converts column to float

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

a float automapper type

Example

A.column(“identifier”).to_float()

Return type

spark_auto_mapper.data_types.float.AutoMapperFloatDataType

to_date(self, formats=None)

Converts a value to date only For datetime use the datetime mapper type

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • formats (Optional[List[str]]) – (Optional) formats to use for trying to parse the value otherwise uses: y-M-d, yyyyMMdd, M/d/y

Returns

a date type

Example

A.column(“date_of_birth”).to_date()

Return type

spark_auto_mapper.data_types.date.AutoMapperDateDataType

to_datetime(self, formats=None)

Converts the value to a timestamp type in Spark

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • formats (Optional[List[str]]) – (Optional) formats to use for trying to parse the value otherwise uses Spark defaults

Example

A.column(“date_of_birth”).to_datetime()

Return type

spark_auto_mapper.data_types.datetime.AutoMapperDateTimeDataType

to_amount(self)

Specifies the value should be used as an amount

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

an amount automapper type

Example

A.column(“payment”).to_amount()

Return type

spark_auto_mapper.data_types.amount.AutoMapperAmountDataType

to_boolean(self)

Specifies the value should be used as a boolean

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

a boolean automapper type

Example

A.column(“paid”).to_boolean()

Return type

spark_auto_mapper.data_types.boolean.AutoMapperBooleanDataType

to_number(self)

Specifies value should be used as a number

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

a number automapper type

Example

A.column(“paid”).to_number()

Return type

spark_auto_mapper.data_types.number.AutoMapperNumberDataType

to_text(self)

Specifies that the value parameter should be used as a literal text

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

a text automapper type

Example

A.column(“paid”).to_text()

Return type

spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase

join_using_delimiter(self, delimiter)

Joins an array and forms a string using the delimiter

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • delimiter (str) – string to use as delimiter

Returns

a join_using_delimiter automapper type

Example

A.column(“suffix”).join_using_delimiter(“, “)

Return type

_TAutoMapperDataType

get_schema(self, include_extension)
Parameters

include_extension (bool) –

Return type

Optional[Union[pyspark.sql.types.StructType, pyspark.sql.types.DataType]]

to_date_format(self, format_)

Converts a date or time into string

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • format – format to use for trying to parse the value otherwise uses: y-M-d yyyyMMdd M/d/y

  • format_ (str) –

Example

A.column(“birth_date”).to_date_format(“y-M-d”)

Return type

spark_auto_mapper.data_types.date_format.AutoMapperFormatDateTimeDataType

to_null_if_empty(self)

returns null if the column is an empty string

Parameters

self (_TAutoMapperDataType) – Set by Python. No need to pass.

Returns

an automapper type

Example

A.column(“my_age”).to_null_if_empty()

Return type

_TAutoMapperDataType

regex_replace(self, pattern, replacement)

Replace all substrings of the specified string value that match regexp with replacement.

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • pattern (str) – pattern to search for

  • replacement (str) – string to replace with

Returns

a regex_replace automapper type

Example

A.column(“last_name”).regex_replace(“first”, “second”)

Example

A.column(“last_name”).regex_replace(r”[^ _.,!”’/$-]”, “.”)

Return type

_TAutoMapperDataType

sanitize(self, pattern='[^\\w\\r\\n\\t _.,!\\"\'/$-]', replacement=' ')

Replaces all “non-normal” characters with specified replacement

By default, We’re using the FHIR definition of valid string (except /S does not seem to work properly in Spark) https://www.hl7.org/fhir/datatypes.html#string Valid characters are (regex=’[ S]’):

S - Any character that is not a whitespace character

  • space

  • carriage return

  • line feed

    • tab

    param self

    Set by Python. No need to pass.

    param pattern

    regex pattern of characters to replace

    param replacement

    (Optional) string to replace with. Defaults to space.

    return

    a regex_replace automapper type

    example

    A.column(“last_name”).sanitize(replacement=”.”)

Parameters
  • self (_TAutoMapperDataType) –

  • pattern (str) –

  • replacement (str) –

Return type

_TAutoMapperDataType

if_exists(self, if_exists=None, if_not_exists=None)

returns column if it exists else returns null

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • if_exists (Optional[_TAutoMapperDataType]) – value to return if column exists

  • if_not_exists (Optional[_TAutoMapperDataType]) – value to return if column does not exist

Returns

an automapper type

Example

A.column(“foo”).if_exists(A.text(“exists”), A.text(“not exists”))

Return type

_TAutoMapperDataType

cast(self, type_)

casts columns to type

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • type – type to cast to

  • type_ (Type[_TAutoMapperDataType2]) –

Returns

an automapper type

Example

A.column(“my_age”).cast(AutoMapperNumberDataType)

Return type

_TAutoMapperDataType2

__add__(self, other)

Allows adding items in an array using the + operation

Parameters
  • self (_TAutoMapperDataType) – Set by Python. No need to pass.

  • other (_TAutoMapperDataType) – array to add to the current array

Example

A.column(“array1”) + [ “foo” ]

Return type

_TAutoMapperDataType