universal_transfer_operator.data_providers.base
Module Contents
Classes
Base class to represent all the DataProviders interactions with Dataset. |
Attributes
- universal_transfer_operator.data_providers.base.DatasetType
- class universal_transfer_operator.data_providers.base.DataStream
- remote_obj_buffer: io.IOBase
- actual_filename: pathlib.Path
- actual_file: universal_transfer_operator.datasets.file.base.File
- class universal_transfer_operator.data_providers.base.DataProviders(dataset, transfer_mode, transfer_params=attr.field(factory=TransferParameters, converter=lambda val: ...))
Bases:
abc.ABC
,Generic
[DatasetType
]Base class to represent all the DataProviders interactions with Dataset.
The goal is to be able to support new dataset by adding a new module to the uto/data_providers directory, without the need of changing other modules and classes.
- Parameters:
dataset (DatasetType) –
transfer_params (universal_transfer_operator.utils.TransferParameters) –
- abstract property hook: airflow.hooks.base.BaseHook
Return an instance of the Airflow hook.
- Return type:
- abstract property openlineage_dataset_namespace: str
Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md
- Return type:
str
- abstract property openlineage_dataset_name: str
Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md
- Return type:
str
- property openlineage_dataset_uri: str
Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md
- Return type:
str
- abstract check_if_exists()
Return true if the dataset exists
- Return type:
bool
- check_if_transfer_supported(source_dataset)
Checks if the transfer is supported from source to destination based on source_dataset.
- Parameters:
source_dataset (DatasetType) –
- Return type:
bool
- abstract read()
Read from filesystem dataset or databases dataset and write to local reference locations or dataframes
- Return type:
Iterator[pd.DataFrame] | Iterator[DataStream]
- abstract write(source_ref)
Write the data from local reference location or a dataframe to the database dataset or filesystem dataset
- Parameters:
source_ref (pd.DataFrame | DataStream) – Stream of data to be loaded into output table or a pandas dataframe.
- Return type:
str
- abstract populate_metadata()
Given a dataset, check if the dataset has metadata.