spark_pipeline_framework.utilities.file_downloader

Module Contents

Classes

ThrowOnErrorOpener

Derived class with handlers for errors we can handle (perhaps).

FileDownloader

Module to download files from urls

class spark_pipeline_framework.utilities.file_downloader.ThrowOnErrorOpener(*args, **kwargs)

Bases: urllib.request.FancyURLopener

Derived class with handlers for errors we can handle (perhaps).

http_error_default(self, url: str, fp: Any, errcode: int, errmsg: str, headers: Any) Any

Default error handling – don’t raise an exception.

class spark_pipeline_framework.utilities.file_downloader.FileDownloader(url: str, download_path: str, extract_archives: Optional[bool] = False)

Module to download files from urls Inspired from python wget module: https://github.com/steveeJ/python-wget

static check_if_path_exists(path: str) bool
static get_filename_from_url(url: str) str
static rename_filename_if_exists(filename: str) str

Expands name portion of filename with numeric ‘ (x)’ suffix to return filename that doesn’t exist already.

extract_zip_files(self, filename: str, path: str, out_path: Optional[str] = None) Optional[str]

This function extracts the archive files and returns the corresponding file path. Returns null if the specified path does not exists :param filename: str: Name of the archive file :param path: str: File path of the archive file :param out_path: optional(str): Destination file path where the extracted files should be stored (default is same as path) :returns filename: str: Output file path

download_files_locally(self, filename: str) str
download_files_to_s3(self, filename: str, bucket: str, download_path: str) str
download_files_from_url(self) Optional[str]

Function to download files from a url