universal_transfer_operator.data_providers.filesystem.base

Module Contents

Classes

TempFile

BaseFilesystemProviders

BaseFilesystemProviders represent all the DataProviders interactions with File system.

class universal_transfer_operator.data_providers.filesystem.base.TempFile
tmp_file: Path | None
actual_filename: pathlib.Path
class universal_transfer_operator.data_providers.filesystem.base.BaseFilesystemProviders(dataset, transfer_mode, transfer_params=attr.field(factory=TransferIntegrationOptions, converter=lambda val: ...))

Bases: universal_transfer_operator.data_providers.base.DataProviders[universal_transfer_operator.datasets.file.base.File]

BaseFilesystemProviders represent all the DataProviders interactions with File system.

Parameters:
abstract property hook: airflow.hooks.base.BaseHook

Return an instance of the database-specific Airflow hook.

Return type:

airflow.hooks.base.BaseHook

abstract property paths: list[str]

Resolve patterns in path

Return type:

list[str]

property transport_params: dict | None

Get credentials required by smart open to access files

Return type:

dict | None

abstract property openlineage_dataset_namespace: str

Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property openlineage_dataset_name: str

Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property openlineage_dataset_uri: str

Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property size: int

Return the size in bytes of the given file

Return type:

int

abstract delete(path)

Delete a file/object if they exists

Parameters:

path (str) –

abstract check_if_exists()

Return true if the dataset exists

Return type:

bool

exists()
Return type:

bool

check_if_transfer_supported(source_dataset)

Checks if the transfer is supported from source to destination based on source_dataset.

Parameters:

source_dataset (universal_transfer_operator.datasets.file.base.File) –

Return type:

bool

read()

Read the remote or local file dataset and returns i/o buffers

Return type:

Iterator[universal_transfer_operator.data_providers.base.DataStream]

read_using_smart_open()

Read the file dataset using smart open returns i/o buffer

Return type:

Iterator[universal_transfer_operator.data_providers.base.DataStream]

write(source_ref)

Write the data from local reference location or a dataframe to the filesystem dataset or database dataset

Parameters:

source_ref (DataStream | pd.DataFrame) – Source DataStream object which will be used to read data

Return type:

str

write_using_smart_open(source_ref)

Write the source data from remote object i/o buffer to the dataset using smart open

Parameters:

source_ref (DataStream | pd.DataFrame) –

Return type:

str

write_from_file(source_ref)

Write the remote object i/o buffer to the dataset using smart open :param source_ref: DataStream object of source dataset :return: File path that is the used for write pattern

Parameters:

source_ref (universal_transfer_operator.data_providers.base.DataStream) –

Return type:

str

write_from_dataframe(source_ref)

Write the dataframe to the SFTP dataset using smart open :param source_ref: DataStream object of source dataset :return: File path that is the used for write pattern

Parameters:

source_ref (pandas.DataFrame) –

Return type:

str

read_as_binary(file)

Checks if file has to be read as binary or as string i/o.

Returns:

True or False

Parameters:

file (str) –

Return type:

bool

static cleanup(file_list)

Cleans up the temporary files created

Parameters:

file_list (list[TempFile]) –

Return type:

None

populate_metadata()

Given a dataset, check if the dataset has metadata.