universal_transfer_operator.data_providers.database.snowflake

Module Contents

Classes

SnowflakeDataProvider

SnowflakeDataProvider represent all the DataProviders interactions with Snowflake Databases.

class universal_transfer_operator.data_providers.database.snowflake.SnowflakeDataProvider(dataset, transfer_mode, transfer_params=attr.field(factory=TransferIntegrationOptions, converter=lambda val: ...))

Bases: universal_transfer_operator.data_providers.database.base.DatabaseDataProvider

SnowflakeDataProvider represent all the DataProviders interactions with Snowflake Databases.

Parameters:
property sql_type: str
Return type:

str

property hook: airflow.providers.snowflake.hooks.snowflake.SnowflakeHook

Retrieve Airflow hook to interface with the Snowflake database.

Return type:

airflow.providers.snowflake.hooks.snowflake.SnowflakeHook

property default_metadata: universal_transfer_operator.datasets.table.Metadata

Fill in default metadata values for table objects addressing snowflake databases

Return type:

universal_transfer_operator.datasets.table.Metadata

property openlineage_dataset_name: str

Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md Example: db_name.schema_name.table_name

Return type:

str

property openlineage_dataset_namespace: str

Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md Example: snowflake://ACCOUNT

Return type:

str

property openlineage_dataset_uri: str

Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

DEFAULT_SCHEMA
static get_table_qualified_name(table)

Return table qualified name. In Snowflake, it is the database, schema and table

Parameters:

table (universal_transfer_operator.datasets.table.Table) – The table we want to retrieve the qualified name for.

Return type:

str

schema_exists(schema)

Checks if a schema exists in the database

Parameters:

schema (str) – DB Schema - a namespace that contains named objects like (tables, functions, etc)

Return type:

bool

create_table_using_schema_autodetection(table, file=None, dataframe=None, columns_names_capitalization='original')

Create a SQL table, automatically inferring the schema using the given file. Overriding default behaviour and not using the prep_table since it doesn’t allow the adding quotes.

Parameters:
  • table (universal_transfer_operator.datasets.table.Table) – The table to be created.

  • file (File | None) – File used to infer the new table columns.

  • dataframe (pd.DataFrame | None) – Dataframe used to infer the new table columns if there is no file

  • columns_names_capitalization (universal_transfer_operator.constants.ColumnCapitalization) – determines whether to convert all columns to lowercase/uppercase in the resulting dataframe

Return type:

None

load_pandas_dataframe_to_table(source_dataframe, target_table, if_exists='replace', chunk_size=DEFAULT_CHUNK_SIZE)

Create a table with the dataframe’s contents. If the table already exists, append or replace the content, depending on the value of if_exists.

Parameters:
  • source_dataframe (pandas.DataFrame) – Local or remote filepath

  • target_table (universal_transfer_operator.datasets.table.Table) – Table in which the file will be loaded

  • if_exists (universal_transfer_operator.constants.LoadExistStrategy) – Strategy to be used in case the target table already exists.

  • chunk_size (int) – Specify the number of rows in each batch to be written at a time.

Return type:

None

truncate_table(table)

Truncate table

classmethod use_quotes(cols)

With snowflake identifier we have two cases,

  1. When Upper/Mixed case col names are used

    We are required to preserver the text casing of the col names. By adding the quotes around identifier.

  2. When lower case col names are used

    We can use them as is

This is done to be in sync with Snowflake SQLAlchemy dialect. https://docs.snowflake.com/en/user-guide/sqlalchemy.html#object-name-case-handling

Snowflake stores all case-insensitive object names in uppercase text. In contrast, SQLAlchemy considers all lowercase object names to be case-insensitive. Snowflake SQLAlchemy converts the object name case during schema-level communication (i.e. during table and index reflection). If you use uppercase object names, SQLAlchemy assumes they are case-sensitive and encloses the names with quotes. This behavior will cause mismatches against data dictionary data received from Snowflake, so unless identifier names have been truly created as case sensitive using quotes (e.g. “TestDb”), all lowercase names should be used on the SQLAlchemy side.

Parameters:

cols (Sequence[str]) – list of columns

Return type:

bool