universal_transfer_operator.data_providers.base

Module Contents

Classes

DataStream

DataProviders

Base class to represent all the DataProviders interactions with Dataset.

Attributes

DatasetType

universal_transfer_operator.data_providers.base.DatasetType
class universal_transfer_operator.data_providers.base.DataStream
remote_obj_buffer: io.IOBase
actual_filename: pathlib.Path
actual_file: universal_transfer_operator.datasets.file.base.File
class universal_transfer_operator.data_providers.base.DataProviders(dataset, transfer_mode, transfer_params=attr.field(factory=TransferParameters, converter=lambda val: ...))

Bases: abc.ABC, Generic[DatasetType]

Base class to represent all the DataProviders interactions with Dataset.

The goal is to be able to support new dataset by adding a new module to the uto/data_providers directory, without the need of changing other modules and classes.

Parameters:
abstract property hook: airflow.hooks.base.BaseHook

Return an instance of the Airflow hook.

Return type:

airflow.hooks.base.BaseHook

abstract property openlineage_dataset_namespace: str

Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property openlineage_dataset_name: str

Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

property openlineage_dataset_uri: str

Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract check_if_exists()

Return true if the dataset exists

Return type:

bool

check_if_transfer_supported(source_dataset)

Checks if the transfer is supported from source to destination based on source_dataset.

Parameters:

source_dataset (DatasetType) –

Return type:

bool

abstract read()

Read from filesystem dataset or databases dataset and write to local reference locations or dataframes

Return type:

Iterator[pd.DataFrame] | Iterator[DataStream]

abstract write(source_ref)

Write the data from local reference location or a dataframe to the database dataset or filesystem dataset

Parameters:

source_ref (pd.DataFrame | DataStream) – Stream of data to be loaded into output table or a pandas dataframe.

Return type:

str

abstract populate_metadata()

Given a dataset, check if the dataset has metadata.