`spark_pipeline_framework.utilities.file_downloader`¶

Module Contents¶

Classes¶

`ThrowOnErrorOpener`	Derived class with handlers for errors we can handle (perhaps).
`FileDownloader`	Module to download files from urls

class spark_pipeline_framework.utilities.file_downloader.ThrowOnErrorOpener(*args, **kwargs)¶

Bases: urllib.request.FancyURLopener

Derived class with handlers for errors we can handle (perhaps).

http_error_default(self, url: str, fp: Any, errcode: int, errmsg: str, headers: Any) → Any¶: Default error handling – don’t raise an exception.

class spark_pipeline_framework.utilities.file_downloader.FileDownloader(url: str, download_path: str, extract_archives: Optional[bool] = False)¶

Module to download files from urls Inspired from python wget module: https://github.com/steveeJ/python-wget

static check_if_path_exists(path: str) → bool¶

static get_filename_from_url(url: str) → str¶

static rename_filename_if_exists(filename: str) → str¶: Expands name portion of filename with numeric ‘ (x)’ suffix to return filename that doesn’t exist already.

extract_zip_files(self, filename: str, path: str, out_path: Optional[str] = None) → Optional[str]¶: This function extracts the archive files and returns the corresponding file path. Returns null if the specified path does not exists :param filename: str: Name of the archive file :param path: str: File path of the archive file :param out_path: optional(str): Destination file path where the extracted files should be stored (default is same as path) :returns filename: str: Output file path

download_files_locally(self, filename: str) → str¶

download_files_to_s3(self, filename: str, bucket: str, download_path: str) → str¶

download_files_from_url(self) → Optional[str]¶: Function to download files from a url

spark_pipeline_framework.utilities.file_downloader¶

Module Contents¶

Classes¶

`spark_pipeline_framework.utilities.file_downloader`¶