universal_transfer_operator.data_providers.filesystem.aws.s3

Module Contents

Classes

S3DataProvider

DataProviders interactions with S3 Dataset.

class universal_transfer_operator.data_providers.filesystem.aws.s3.S3DataProvider(dataset, transfer_params=attr.field(factory=TransferIntegrationOptions, converter=lambda val: ...), transfer_mode=TransferMode.NONNATIVE)

Bases: universal_transfer_operator.data_providers.filesystem.base.BaseFilesystemProviders

DataProviders interactions with S3 Dataset.

Parameters:
property transport_params: dict

Structure s3fs credentials from Airflow connection. s3fs enables pandas to write to s3

Return type:

dict

property paths: list[str]

Resolve S3 file paths with prefix

Return type:

list[str]

property verify: Any
Return type:

Any

property transfer_config_args: Any
Return type:

Any

property s3_extra_args: Any
Return type:

Any

property bucket_name: str
Return type:

str

property s3_key: str
Return type:

str

property s3_acl_policy: Any
Return type:

Any

property prefix: Any
Return type:

Any

property keep_directory_structure: Any
Return type:

Any

property delimiter: Any
Return type:

Any

abstract property openlineage_dataset_namespace: str

Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property openlineage_dataset_name: str

Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

abstract property openlineage_dataset_uri: str

Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Return type:

str

property size: int

Return file size for S3 location

Return type:

int

hook()

Return an instance of the database-specific Airflow hook.

Return type:

airflow.providers.amazon.aws.hooks.s3.S3Hook

delete(path=None)

Delete a file/object if they exists

Parameters:

path (str | None) –

check_if_exists(path=None)

Return true if the dataset exists

Parameters:

path (str | None) –

Return type:

bool

read_using_hook()

Read the file from dataset and write to local file location

Return type:

Iterator[list[universal_transfer_operator.data_providers.filesystem.base.TempFile]]

write_using_hook(source_ref)

Write the file from local file location to the dataset

Parameters:

source_ref (list[universal_transfer_operator.data_providers.filesystem.base.TempFile]) –

download_file(file)

Download file and save to temporary path.

Return type:

universal_transfer_operator.data_providers.filesystem.base.TempFile