universal_transfer_operator.data_providers.filesystem.sftp

Module Contents

Classes

SFTPDataProvider

DataProviders interactions with GS Dataset.

class universal_transfer_operator.data_providers.filesystem.sftp.SFTPDataProvider(dataset, transfer_params=attr.field(factory=TransferIntegrationOptions, converter=lambda val: ...), transfer_mode=TransferMode.NONNATIVE)

Bases: universal_transfer_operator.data_providers.filesystem.base.BaseFilesystemProviders

DataProviders interactions with GS Dataset.

Parameters:
property paths: list[str]

Resolve SFTP file paths with netloc of self.dataset.path as prefix. Paths are added if they start with prefix

Example - if there are multiple paths like
  • sftp://upload/test.csv

  • sftp://upload/test.json

  • sftp://upload/home.parquet

  • sftp://upload/sample.ndjson

If self.dataset.path is “sftp://upload/test” will return sftp://upload/test.csv and sftp://upload/test.json

Return type:

list[str]

property transport_params: dict

get SFTP credentials for storage

Return type:

dict

abstract property openlineage_dataset_namespace: str

Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property openlineage_dataset_name: str

Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

property size: int

Return file size for SFTP location

Return type:

int

hook()

Return an instance of the SFTPHook Airflow hook.

Return type:

airflow.providers.sftp.hooks.sftp.SFTPHook

delete(path=None)

Delete a file/object if they exists

Parameters:

path (str | None) –

check_if_exists(path=None)

Return true if the dataset exists

Parameters:

path (str | None) –

get_uri()
get_complete_url(dst_url, src_url)

Get complete url with host, port, username, password if they are not provided in the dst_url

Parameters:
  • dst_url (str) –

  • src_url (str) –

Return type:

str

write_using_smart_open(source_ref)

Write the source data from remote object i/o buffer to the dataset using smart open

Parameters:

source_ref (DataStream | pd.DataFrame) –

write_from_file(source_ref)

Write the remote object i/o buffer to the dataset using smart open :param source_ref: DataStream object of source dataset :return: File path that is the used for write pattern

Parameters:

source_ref (universal_transfer_operator.data_providers.base.DataStream) –

Return type:

str

write_from_dataframe(source_ref)

Write the dataframe to the SFTP dataset using smart open :param source_ref: DataStream object of source dataset :return: File path that is the used for write pattern

Parameters:

source_ref (pandas.DataFrame) –

Return type:

str