spark_pipeline_framework.utilities.file_downloader¶
Module Contents¶
Classes¶
Derived class with handlers for errors we can handle (perhaps). |
|
Module to download files from urls |
- class spark_pipeline_framework.utilities.file_downloader.ThrowOnErrorOpener(*args, **kwargs)¶
Bases:
urllib.request.FancyURLopenerDerived class with handlers for errors we can handle (perhaps).
- http_error_default(self, url: str, fp: Any, errcode: int, errmsg: str, headers: Any) Any¶
Default error handling – don’t raise an exception.
- class spark_pipeline_framework.utilities.file_downloader.FileDownloader(url: str, download_path: str, extract_archives: Optional[bool] = False)¶
Module to download files from urls Inspired from python wget module: https://github.com/steveeJ/python-wget
- static check_if_path_exists(path: str) bool¶
- static get_filename_from_url(url: str) str¶
- static rename_filename_if_exists(filename: str) str¶
Expands name portion of filename with numeric ‘ (x)’ suffix to return filename that doesn’t exist already.
- extract_zip_files(self, filename: str, path: str, out_path: Optional[str] = None) Optional[str]¶
This function extracts the archive files and returns the corresponding file path. Returns null if the specified path does not exists :param filename: str: Name of the archive file :param path: str: File path of the archive file :param out_path: optional(str): Destination file path where the extracted files should be stored (default is same as path) :returns filename: str: Output file path
- download_files_locally(self, filename: str) str¶
- download_files_to_s3(self, filename: str, bucket: str, download_path: str) str¶
- download_files_from_url(self) Optional[str]¶
Function to download files from a url