universal_transfer_operator.data_providers.filesystem.base
Module Contents
Classes
BaseFilesystemProviders represent all the DataProviders interactions with File system. |
- class universal_transfer_operator.data_providers.filesystem.base.TempFile
- tmp_file: Path | None
- actual_filename: pathlib.Path
- class universal_transfer_operator.data_providers.filesystem.base.BaseFilesystemProviders(dataset, transfer_mode, transfer_params=attr.field(factory=TransferIntegrationOptions, converter=lambda val: ...))
Bases:
universal_transfer_operator.data_providers.base.DataProviders
[universal_transfer_operator.datasets.file.base.File
]BaseFilesystemProviders represent all the DataProviders interactions with File system.
- Parameters:
dataset (universal_transfer_operator.datasets.file.base.File) –
transfer_params (universal_transfer_operator.universal_transfer_operator.TransferIntegrationOptions) –
- abstract property hook: airflow.hooks.base.BaseHook
Return an instance of the database-specific Airflow hook.
- Return type:
- abstract property paths: list[str]
Resolve patterns in path
- Return type:
list[str]
- property transport_params: dict | None
Get credentials required by smart open to access files
- Return type:
dict | None
- abstract property openlineage_dataset_namespace: str
Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md
- Return type:
str
- abstract property openlineage_dataset_name: str
Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md
- Return type:
str
- abstract property openlineage_dataset_uri: str
Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md
- Return type:
str
- abstract property size: int
Return the size in bytes of the given file
- Return type:
int
- abstract delete(path)
Delete a file/object if they exists
- Parameters:
path (str) –
- abstract check_if_exists()
Return true if the dataset exists
- Return type:
bool
- exists()
- Return type:
bool
- check_if_transfer_supported(source_dataset)
Checks if the transfer is supported from source to destination based on source_dataset.
- Parameters:
source_dataset (universal_transfer_operator.datasets.file.base.File) –
- Return type:
bool
- read()
Read the remote or local file dataset and returns i/o buffers
- Return type:
Iterator[universal_transfer_operator.data_providers.base.DataStream]
- read_using_smart_open()
Read the file dataset using smart open returns i/o buffer
- Return type:
Iterator[universal_transfer_operator.data_providers.base.DataStream]
- write(source_ref)
Write the data from local reference location or a dataframe to the filesystem dataset or database dataset
- Parameters:
source_ref (DataStream | pd.DataFrame) – Source DataStream object which will be used to read data
- Return type:
str
- write_using_smart_open(source_ref)
Write the source data from remote object i/o buffer to the dataset using smart open
- Parameters:
source_ref (DataStream | pd.DataFrame) –
- Return type:
str
- write_from_file(source_ref)
Write the remote object i/o buffer to the dataset using smart open :param source_ref: DataStream object of source dataset :return: File path that is the used for write pattern
- Parameters:
source_ref (universal_transfer_operator.data_providers.base.DataStream) –
- Return type:
str
- write_from_dataframe(source_ref)
Write the dataframe to the SFTP dataset using smart open :param source_ref: DataStream object of source dataset :return: File path that is the used for write pattern
- Parameters:
source_ref (pandas.DataFrame) –
- Return type:
str
- read_as_binary(file)
Checks if file has to be read as binary or as string i/o.
- Returns:
True or False
- Parameters:
file (str) –
- Return type:
bool
- static cleanup(file_list)
Cleans up the temporary files created
- Parameters:
file_list (list[TempFile]) –
- Return type:
None
- populate_metadata()
Given a dataset, check if the dataset has metadata.