spark_auto_mapper.data_types.data_type_base
¶
Module Contents¶
Classes¶
Base class for all Automapper data types |
- class spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase¶
Base class for all Automapper data types
- abstract get_column_spec(self, source_df, current_column)¶
Gets the column spec for this automapper data type
- Parameters
source_df (Optional[pyspark.sql.DataFrame]) – source data frame in case the automapper type needs that data to decide what to do
current_column (Optional[pyspark.sql.Column]) – (Optional) this is set when we are inside an array
- Return type
pyspark.sql.Column
- get_value(self, value, source_df, current_column)¶
Gets the value for this automapper
- Parameters
value (AutoMapperDataTypeBase) – current value
source_df (Optional[pyspark.sql.DataFrame]) – source data frame in case the automapper type needs that data to decide what to do
current_column (Optional[pyspark.sql.Column]) – (Optional) this is set when we are inside an array
- Return type
pyspark.sql.Column
- include_null_properties(self, include_null_properties)¶
- Parameters
include_null_properties (bool) –
- Return type
None
- transform(self, value)¶
transforms a column into another type or struct
- Parameters
self (AutoMapperDataTypeBase) – Set by Python. No need to pass.
value (_TAutoMapperDataType) – Complex or Simple Type to create for each item in the array
- Returns
a transform automapper type
- Example
A.column(“last_name”).transform(A.complex(bar=A.field(“value”), bar2=A.field(“system”)))
- Return type
List[_TAutoMapperDataType]
- select(self, value)¶
transforms a column into another type or struct
- Parameters
self – Set by Python. No need to pass.
value (_TAutoMapperDataType) – Complex or Simple Type to create for each item in the array
- Returns
a transform automapper type
- Example
A.column(“last_name”).select(A.complex(bar=A.field(“value”), bar2=A.field(“system”)))
- Return type
_TAutoMapperDataType
- filter(self, func)¶
filters an array column
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
func (Callable[[pyspark.sql.Column], pyspark.sql.Column]) – func to create type or struct
- Returns
a filter automapper type
- Example
A.column(“last_name”).filter(lambda x: x[“use”] == lit(“usual”)
- Return type
_TAutoMapperDataType
)
- split_by_delimiter(self, delimiter)¶
splits a text column by the delimiter to create an array
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
delimiter (str) – delimiter
- Returns
a split_by_delimiter automapper type
- Example
A.column(“last_name”).split_by_delimiter(“|”)
- Return type
_TAutoMapperDataType
- select_one(self, value)¶
selects first item from array
- Parameters
self – Set by Python. No need to pass.
value (_TAutoMapperDataType) – Complex or Simple Type to create for each item in the array
- Returns
a transform automapper type
- Example
A.column(“identifier”).select_one(A.field(“_.value”))
- Return type
_TAutoMapperDataType
- first(self)¶
returns the first element in array
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
a filter automapper type
- Example
A.column(“identifier”).select(A.field(“_.value”)).first()
- Return type
_TAutoMapperDataType
- expression(self, value)¶
Specifies that the value parameter should be executed as a sql expression in Spark
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
value (str) – sql to run
- Returns
an expression automapper type
- Example
A.column(“identifier”).expression( ” CASE
WHEN Member Sex = ‘F’ THEN ‘female’ WHEN Member Sex = ‘M’ THEN ‘male’ ELSE ‘other’
END ” )
- Return type
_TAutoMapperDataType
- current(self)¶
Specifies to use the current item
- Parameters
self – Set by Python. No need to pass.
- Returns
A column automapper type
- Example
A.column(“last_name”).current()
- Return type
_TAutoMapperDataType
- field(self, value)¶
Specifies that the value parameter should be used as a field name
- Parameters
self – Set by Python. No need to pass.
value (str) – name of field
- Returns
A column automapper type
- Example
A.column(“identifier”).select_one(A.field(“type.coding[0].code”))
- Return type
_TAutoMapperDataType
- flatten(self)¶
creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. source: http://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/functions.html#flatten
- Parameters
self – Set by Python. No need to pass.
- Returns
a flatten automapper type
- Example
A.flatten(A.column(“column”))
- Return type
- to_array(self)¶
converts single element into an array
- Parameters
self – Set by Python. No need to pass.
- Returns
an automapper type
- Example
A.column(“identifier”).to_array()
- Return type
spark_auto_mapper.data_types.array_base.AutoMapperArrayLikeBase
- concat(self, list2)¶
concatenates two arrays or strings
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
list2 (_TAutoMapperDataType) – list to concat into the current column
- Returns
a filter automapper type
- Example
A.column(“identifier”).concat(A.text(“foo”).to_array()))
- Return type
_TAutoMapperDataType
- to_float(self)¶
Converts column to float
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
a float automapper type
- Example
A.column(“identifier”).to_float()
- Return type
- to_date(self, formats=None)¶
Converts a value to date only For datetime use the datetime mapper type
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
formats (Optional[List[str]]) – (Optional) formats to use for trying to parse the value otherwise uses: y-M-d, yyyyMMdd, M/d/y
- Returns
a date type
- Example
A.column(“date_of_birth”).to_date()
- Return type
- to_datetime(self, formats=None)¶
Converts the value to a timestamp type in Spark
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
formats (Optional[List[str]]) – (Optional) formats to use for trying to parse the value otherwise uses Spark defaults
- Example
A.column(“date_of_birth”).to_datetime()
- Return type
spark_auto_mapper.data_types.datetime.AutoMapperDateTimeDataType
- to_amount(self)¶
Specifies the value should be used as an amount
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
an amount automapper type
- Example
A.column(“payment”).to_amount()
- Return type
spark_auto_mapper.data_types.amount.AutoMapperAmountDataType
- to_boolean(self)¶
Specifies the value should be used as a boolean
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
a boolean automapper type
- Example
A.column(“paid”).to_boolean()
- Return type
spark_auto_mapper.data_types.boolean.AutoMapperBooleanDataType
- to_number(self)¶
Specifies value should be used as a number
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
a number automapper type
- Example
A.column(“paid”).to_number()
- Return type
spark_auto_mapper.data_types.number.AutoMapperNumberDataType
- to_text(self)¶
Specifies that the value parameter should be used as a literal text
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
a text automapper type
- Example
A.column(“paid”).to_text()
- Return type
spark_auto_mapper.data_types.text_like_base.AutoMapperTextLikeBase
- join_using_delimiter(self, delimiter)¶
Joins an array and forms a string using the delimiter
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
delimiter (str) – string to use as delimiter
- Returns
a join_using_delimiter automapper type
- Example
A.column(“suffix”).join_using_delimiter(“, “)
- Return type
_TAutoMapperDataType
- get_schema(self, include_extension)¶
- Parameters
include_extension (bool) –
- Return type
Optional[Union[pyspark.sql.types.StructType, pyspark.sql.types.DataType]]
- to_date_format(self, format_)¶
Converts a date or time into string
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
format – format to use for trying to parse the value otherwise uses: y-M-d yyyyMMdd M/d/y
format_ (str) –
- Example
A.column(“birth_date”).to_date_format(“y-M-d”)
- Return type
spark_auto_mapper.data_types.date_format.AutoMapperFormatDateTimeDataType
- to_null_if_empty(self)¶
returns null if the column is an empty string
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
- Returns
an automapper type
- Example
A.column(“my_age”).to_null_if_empty()
- Return type
_TAutoMapperDataType
- regex_replace(self, pattern, replacement)¶
Replace all substrings of the specified string value that match regexp with replacement.
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
pattern (str) – pattern to search for
replacement (str) – string to replace with
- Returns
a regex_replace automapper type
- Example
A.column(“last_name”).regex_replace(“first”, “second”)
- Example
A.column(“last_name”).regex_replace(r”[^ _.,!”’/$-]”, “.”)
- Return type
_TAutoMapperDataType
- sanitize(self, pattern='[^\\w\\r\\n\\t _.,!\\"\'/$-]', replacement=' ')¶
Replaces all “non-normal” characters with specified replacement
By default, We’re using the FHIR definition of valid string (except /S does not seem to work properly in Spark) https://www.hl7.org/fhir/datatypes.html#string Valid characters are (regex=’[ S]’):
S - Any character that is not a whitespace character
space
carriage return
line feed
tab
- param self
Set by Python. No need to pass.
- param pattern
regex pattern of characters to replace
- param replacement
(Optional) string to replace with. Defaults to space.
- return
a regex_replace automapper type
- example
A.column(“last_name”).sanitize(replacement=”.”)
- Parameters
self (_TAutoMapperDataType) –
pattern (str) –
replacement (str) –
- Return type
_TAutoMapperDataType
- if_exists(self, if_exists=None, if_not_exists=None)¶
returns column if it exists else returns null
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
if_exists (Optional[_TAutoMapperDataType]) – value to return if column exists
if_not_exists (Optional[_TAutoMapperDataType]) – value to return if column does not exist
- Returns
an automapper type
- Example
A.column(“foo”).if_exists(A.text(“exists”), A.text(“not exists”))
- Return type
_TAutoMapperDataType
- cast(self, type_)¶
casts columns to type
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
type – type to cast to
type_ (Type[_TAutoMapperDataType2]) –
- Returns
an automapper type
- Example
A.column(“my_age”).cast(AutoMapperNumberDataType)
- Return type
_TAutoMapperDataType2
- __add__(self, other)¶
Allows adding items in an array using the + operation
- Parameters
self (_TAutoMapperDataType) – Set by Python. No need to pass.
other (_TAutoMapperDataType) – array to add to the current array
- Example
A.column(“array1”) + [ “foo” ]
- Return type
_TAutoMapperDataType