:py:mod:`spark_auto_mapper.data_types.data_type_base` ===================================================== .. py:module:: spark_auto_mapper.data_types.data_type_base Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: spark_auto_mapper.data_types.data_type_base.AutoMapperDataTypeBase .. py:class:: AutoMapperDataTypeBase Base class for all Automapper data types .. py:method:: get_column_spec(self, source_df, current_column) :abstractmethod: Gets the column spec for this automapper data type :param source_df: source data frame in case the automapper type needs that data to decide what to do :param current_column: (Optional) this is set when we are inside an array .. py:method:: get_value(self, value, source_df, current_column) Gets the value for this automapper :param value: current value :param source_df: source data frame in case the automapper type needs that data to decide what to do :param current_column: (Optional) this is set when we are inside an array .. py:method:: include_null_properties(self, include_null_properties) .. py:method:: transform(self, value) transforms a column into another type or struct :param self: Set by Python. No need to pass. :param value: Complex or Simple Type to create for each item in the array :return: a transform automapper type :example: A.column("last_name").transform(A.complex(bar=A.field("value"), bar2=A.field("system"))) .. py:method:: select(self, value) transforms a column into another type or struct :param self: Set by Python. No need to pass. :param value: Complex or Simple Type to create for each item in the array :return: a transform automapper type :example: A.column("last_name").select(A.complex(bar=A.field("value"), bar2=A.field("system"))) .. py:method:: filter(self, func) filters an array column :param self: Set by Python. No need to pass. :param func: func to create type or struct :return: a filter automapper type :example: A.column("last_name").filter(lambda x: x["use"] == lit("usual") ) .. py:method:: split_by_delimiter(self, delimiter) splits a text column by the delimiter to create an array :param self: Set by Python. No need to pass. :param delimiter: delimiter :return: a split_by_delimiter automapper type :example: A.column("last_name").split_by_delimiter("|") .. py:method:: select_one(self, value) selects first item from array :param self: Set by Python. No need to pass. :param value: Complex or Simple Type to create for each item in the array :return: a transform automapper type :example: A.column("identifier").select_one(A.field("_.value")) .. py:method:: first(self) returns the first element in array :param self: Set by Python. No need to pass. :return: a filter automapper type :example: A.column("identifier").select(A.field("_.value")).first() .. py:method:: expression(self, value) Specifies that the value parameter should be executed as a sql expression in Spark :param self: Set by Python. No need to pass. :param value: sql to run :return: an expression automapper type :example: A.column("identifier").expression( " CASE WHEN `Member Sex` = 'F' THEN 'female' WHEN `Member Sex` = 'M' THEN 'male' ELSE 'other' END " ) .. py:method:: current(self) Specifies to use the current item :param self: Set by Python. No need to pass. :return: A column automapper type :example: A.column("last_name").current() .. py:method:: field(self, value) Specifies that the value parameter should be used as a field name :param self: Set by Python. No need to pass. :param value: name of field :return: A column automapper type :example: A.column("identifier").select_one(A.field("type.coding[0].code")) .. py:method:: flatten(self) creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. source: http://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/functions.html#flatten :param self: Set by Python. No need to pass. :return: a flatten automapper type :example: A.flatten(A.column("column")) .. py:method:: to_array(self) converts single element into an array :param self: Set by Python. No need to pass. :return: an automapper type :example: A.column("identifier").to_array() .. py:method:: concat(self, list2) concatenates two arrays or strings :param self: Set by Python. No need to pass. :param list2: list to concat into the current column :return: a filter automapper type :example: A.column("identifier").concat(A.text("foo").to_array())) .. py:method:: to_float(self) Converts column to float :param self: Set by Python. No need to pass. :return: a float automapper type :example: A.column("identifier").to_float() .. py:method:: to_date(self, formats = None) Converts a value to date only For datetime use the datetime mapper type :param self: Set by Python. No need to pass. :param formats: (Optional) formats to use for trying to parse the value otherwise uses: y-M-d, yyyyMMdd, M/d/y :return: a date type :example: A.column("date_of_birth").to_date() .. py:method:: to_datetime(self, formats = None) Converts the value to a timestamp type in Spark :param self: Set by Python. No need to pass. :param formats: (Optional) formats to use for trying to parse the value otherwise uses Spark defaults :example: A.column("date_of_birth").to_datetime() .. py:method:: to_amount(self) Specifies the value should be used as an amount :param self: Set by Python. No need to pass. :return: an amount automapper type :example: A.column("payment").to_amount() .. py:method:: to_boolean(self) Specifies the value should be used as a boolean :param self: Set by Python. No need to pass. :return: a boolean automapper type :example: A.column("paid").to_boolean() .. py:method:: to_number(self) Specifies value should be used as a number :param self: Set by Python. No need to pass. :return: a number automapper type :example: A.column("paid").to_number() .. py:method:: to_text(self) Specifies that the value parameter should be used as a literal text :param self: Set by Python. No need to pass. :return: a text automapper type :example: A.column("paid").to_text() .. py:method:: join_using_delimiter(self, delimiter) Joins an array and forms a string using the delimiter :param self: Set by Python. No need to pass. :param delimiter: string to use as delimiter :return: a join_using_delimiter automapper type :example: A.column("suffix").join_using_delimiter(", ") .. py:method:: get_schema(self, include_extension) .. py:method:: to_date_format(self, format_) Converts a date or time into string :param self: Set by Python. No need to pass. :param format_: format to use for trying to parse the value otherwise uses: y-M-d yyyyMMdd M/d/y :example: A.column("birth_date").to_date_format("y-M-d") .. py:method:: to_null_if_empty(self) returns null if the column is an empty string :param self: Set by Python. No need to pass. :return: an automapper type :example: A.column("my_age").to_null_if_empty() .. py:method:: regex_replace(self, pattern, replacement) Replace all substrings of the specified string value that match regexp with replacement. :param self: Set by Python. No need to pass. :param pattern: pattern to search for :param replacement: string to replace with :return: a regex_replace automapper type :example: A.column("last_name").regex_replace("first", "second") :example: A.column("last_name").regex_replace(r"[^ _.,!"'/$-]", ".") .. py:method:: sanitize(self, pattern = '[^\\w\\r\\n\\t _.,!\\"\'/$-]', replacement = ' ') Replaces all "non-normal" characters with specified replacement By default, We're using the FHIR definition of valid string (except /S does not seem to work properly in Spark) https://www.hl7.org/fhir/datatypes.html#string Valid characters are (regex='[ \S]'): \S - Any character that is not a whitespace character - space - carriage return - line feed - tab :param self: Set by Python. No need to pass. :param pattern: regex pattern of characters to replace :param replacement: (Optional) string to replace with. Defaults to space. :return: a regex_replace automapper type :example: A.column("last_name").sanitize(replacement=".") .. py:method:: if_exists(self, if_exists = None, if_not_exists = None) returns column if it exists else returns null :param self: Set by Python. No need to pass. :param if_exists: value to return if column exists :param if_not_exists: value to return if column does not exist :return: an automapper type :example: A.column("foo").if_exists(A.text("exists"), A.text("not exists")) .. py:method:: cast(self, type_) casts columns to type :param self: Set by Python. No need to pass. :param type_: type to cast to :return: an automapper type :example: A.column("my_age").cast(AutoMapperNumberDataType) .. py:method:: __add__(self, other) Allows adding items in an array using the + operation :param self: Set by Python. No need to pass. :param other: array to add to the current array :example: A.column("array1") + [ "foo" ]