universal_transfer_operator.datasets.file.base
Module Contents
Classes
Repersents all file dataset. |
- class universal_transfer_operator.datasets.file.base.File
Bases:
universal_transfer_operator.datasets.base.Dataset
Repersents all file dataset.
- Parameters:
path – Path to a file in the filesystem/Object stores
conn_id – Airflow connection ID
filetype – constant to provide an explicit file type
normalize_config – parameters in dict format of pandas json_normalize() function.
is_bytes – is bytes
- property location
- property size: int
Return the size in bytes of the given file.
- Returns:
File size in bytes
- Return type:
int
- path: str
- conn_id: str
- filetype: FileTypeConstant | None
- normalize_config: dict | None
- is_bytes: bool = False
- uri: str
- extra: dict
- is_dataframe: bool = False
- is_binary()
Return a constants.FileType given the filepath. Uses a native strategy, using the file extension.
- Returns:
True or False
- Return type:
bool
- is_pattern()
Returns True when file path is a pattern(eg. s3://bucket/folder or /folder/sample_* etc)
- Returns:
True or False
- Return type:
bool
- create_from_dataframe(df, store_as_dataframe=True)
Create a file in the desired location using the values of a dataframe.
- Parameters:
df (pandas.DataFrame) – pandas dataframe
store_as_dataframe (bool) – Whether the data should later be deserialized as a dataframe or as a file containing delimited data (e.g. csv, parquet, etc.).
- Return type:
None
- export_to_dataframe(**kwargs)
Read file from all supported location and convert them into dataframes.
- Return type:
pandas.DataFrame
- export_to_dataframe_via_byte_stream(**kwargs)
Read files from all supported locations and convert them into dataframes. Due to noted issues with using smart_open with pandas (like https://github.com/RaRe-Technologies/smart_open/issues/524), we create a BytesIO or StringIO buffer before exporting to a dataframe. We’ve found a sizable speed improvement with this optimization.
- Return type:
pandas.DataFrame
- exists()
Check if the file exists or not
- Return type:
bool
- classmethod from_json(serialized_object)
- Parameters:
serialized_object (dict) –