ensembl.utils package#

Ensembl Python general-purpose utils library.

ensembl.utils.StrPath = str | os.PathLike[str]#

Type alias for a string or os.PathLike path.

Subpackages#

Submodules#

ensembl.utils.archive module#

Utils for common IO operations over archive files, e.g. tar or gzip.

ensembl.utils.archive.extract_file(src_file, dst_dir)[source]#

Extracts the src_file into dst_dir.

If the file is not an archive, it will be copied to dst_dir. dst_dir will be created if it does not exist.

Parameters:
  • src_file (str | PathLike[str]) – Path to the file to unpack.

  • dst_dir (str | PathLike[str]) – Path to the folder where to extract the file.

Return type:

None

ensembl.utils.archive.open_gz_file(file_path, mode='rt', encoding='utf-8')[source]#

Yields an open file object, even if the file is compressed with gzip.

The file is expected to contain a text, and this can be used with the usual “with”.

Parameters:
  • file_path (str | PathLike[str]) – A (single) file path to open.

  • mode (str, default: 'rt') – The mode in which the file is opened.

  • encoding (str, default: 'utf-8') – The name of the encoding used to decode or encode the file.

Return type:

Generator[GzipFile | IO, None, None]

ensembl.utils.argparse module#

Provide an extended version of argparse.ArgumentParser with additional functionality.

Examples

>>> from pathlib import Path
>>> from ensembl.util.argparse import ArgumentParser
>>> parser = ArgumentParser(description="Tool description")
>>> parser.add_argument_src_path("--src_file", required=True, help="Path to source file")
>>> parser.add_server_arguments(help="Server to connect to")
>>> args = parser.parse_args()
>>> args
Namespace(host='localhost', port=3826, src_file=PosixPath('/path/to/src_file.txt'),
url=URL('mysql://username@localhost:3826'), user='username')
exception ensembl.utils.argparse.ArgumentError[source]#

Bases: Exception

An error from creating an argument (optional or positional).

class ensembl.utils.argparse.ArgumentParser(*args, **kwargs)[source]#

Bases: ArgumentParser

Extends argparse.ArgumentParser with additional methods and functionality.

The default behaviour of the help text will be to display the default values on every non-required argument, i.e. optional arguments with required=False.

add_argument(*args, **kwargs)[source]#

Extends the parent function by excluding the default value in the help text when not provided.

Only applied to required arguments without a default value, i.e. positional arguments or optional arguments with required=True.

Return type:

None

add_argument_dst_path(*args, exists_ok=True, **kwargs)[source]#

Adds pathlib.Path argument, checking if it is writable at parsing time.

If “metavar” is not defined it is added with “PATH” as value to improve help text readability.

Parameters:

exists_ok (bool, default: True) – Do not raise an error if the destination path already exists.

Return type:

None

add_argument_src_path(*args, **kwargs)[source]#

Adds pathlib.Path argument, checking if it exists and it is readable at parsing time.

If “metavar” is not defined, it is added with “PATH” as value to improve help text readability.

Return type:

None

add_argument_url(*args, **kwargs)[source]#

Adds sqlalchemy.engine.URL argument.

If “metavar” is not defined it is added with “URI” as value to improve help text readability.

Return type:

None

add_log_arguments(add_log_file=False)[source]#

Adds the usual set of arguments required to set and initialise a logging system.

The current set includes a mutually exclusive group for the default logging level: –verbose, –debug, –quiet or –log LEVEL.

Parameters:

add_log_file (bool, default: False) – Add arguments to allow storing messages into a file, i.e. –log_file and –log_file_level.

Return type:

None

add_numeric_argument(*args, type=<class 'float'>, min_value=None, max_value=None, **kwargs)[source]#

Adds a numeric argument with constrains on its type and its minimum or maximum value.

Note that the default value (if defined) is not checked unless the argument is an optional argument and no value is provided in the command line.

Parameters:
  • type (Callable[[str], int | float], default: <class 'float'>) – Type to convert the argument value to when parsing.

  • min_value (Union[int, float, None], default: None) – Minimum value constrain. If None, no minimum value constrain.

  • max_value (Union[int, float, None], default: None) – Maximum value constrain. If None, no maximum value constrain.

Return type:

None

add_server_arguments(prefix='', include_database=False, help=None)[source]#

Adds the usual set of arguments needed to connect to a server, i.e. –host, –port, –user and –password (optional).

Note that the parser will assume this is a MySQL server.

Warning

Avoid passing --password directly on the command line as it will be visible in the process list and shell history. Use an environment variable or an interactive prompt via getpass instead.

Parameters:
  • prefix (str, default: '') – Prefix to add the each argument, e.g. if prefix is src_, the arguments will be –src_host, etc.

  • include_database (bool, default: False) – Include –database argument.

  • help (Optional[str], default: None) – Description message to include for this set of arguments.

Return type:

None

parse_args(*args, **kwargs)[source]#

Extends the parent function by adding a new URL argument for every server group added.

The type of this new argument will be sqlalchemy.engine.URL. It also logs all the parsed arguments for debugging purposes when logging arguments have been added.

Return type:

Namespace

ensembl.utils.checksums module#

Utils for common hash operations (often referred to as checksums) over files, e.g. MD5 or SHA128.

ensembl.utils.checksums.get_file_hash(file_path, algorithm='md5')[source]#

Returns the hash value for a given file and hash algorithm.

Parameters:
  • file_path (str | PathLike[str]) – File path to get the hash for.

  • algorithm (str, default: 'md5') – Secure hash or message digest algorithm name.

Return type:

str

ensembl.utils.checksums.validate_file_hash(file_path, hash_value, algorithm='md5')[source]#

Returns true if the file’s hash value is the same as the one provided for that hash algorithm, false otherwise.

Parameters:
  • file_path (str | PathLike[str]) – Path to the file to validate.

  • hash_value (str) – Expected hash value.

  • algorithm (str, default: 'md5') – Secure hash or message digest algorithm name.

Return type:

bool

ensembl.utils.logging module#

Easy initialisation functionality to set an event logging system.

Examples

>>> import logging, pathlib
>>> from ensembl.utils.logging import init_logging
>>> logfile = pathlib.Path("test.log")
>>> init_logging("INFO", logfile, "DEBUG")
>>> logging.info("This message is written in both stderr and the log file")
>>> logging.debug("This message is only written in the log file")
ensembl.utils.logging.init_logging(log_level='WARNING', log_file=None, log_file_level='DEBUG', msg_format='%(asctime)s [%(process)s] %(levelname)-9s %(name)-13s: %(message)s')[source]#

Initialises the logging system.

By default, all the log messages corresponding to log_level (and above) will be printed in the standard error. If log_file is provided, all messages of log_file_level level (and above) will be written into the provided file.

Parameters:
  • log_level (Union[int, str], default: 'WARNING') – Minimum logging level for the standard error.

  • log_file (Union[str, PathLike[str], None], default: None) – Logging file where to write logging messages besides the standard error.

  • log_file_level (Union[int, str], default: 'DEBUG') – Minimum logging level for the logging file.

  • msg_format (str, default: '%(asctime)s [%(process)s] %(levelname)-9s %(name)-13s: %(message)s') – A format string for the logged output as a whole. More information: https://docs.python.org/3/library/logging.html#logrecord-attributes

Return type:

None

ensembl.utils.logging.init_logging_with_args(args)[source]#

Processes the Namespace object provided to call init_logging() with the correct arguments.

Parameters:

args (Namespace) – Namespace populated by an argument parser.

Return type:

None

ensembl.utils.plugin module#

Ensembl’s pytest plugin with useful unit testing hooks and fixtures.

ensembl.utils.plugin.fixture_assert_files()[source]#

Returns a function that asserts if two text files are equal, or prints their differences.

Return type:

Callable[[str | PathLike[str], str | PathLike[str]], None]

ensembl.utils.plugin.fixture_db_factory(request, data_dir)[source]#

Yields a unit test database factory.

Parameters:
  • request (FixtureRequest) – Fixture that provides information of the requesting test function.

  • data_dir (Path) – Fixture that provides the path to the test data folder matching the test’s name.

Return type:

Generator[Callable[[str | PathLike[str] | None, str | None, MetaData | None], UnitTestDB], None, None]

ensembl.utils.plugin.local_data_dir(request)[source]#

Returns the path to the test data folder matching the test’s name.

Parameters:

request (FixtureRequest) – Fixture that provides information of the requesting test function.

Return type:

Path

ensembl.utils.plugin.pytest_addoption(parser)[source]#

Registers argparse-style options for Ensembl’s unit testing.

Pytest initialisation hook.

Parameters:

parser (Parser) – Parser for command line arguments and ini-file values.

Return type:

None

ensembl.utils.plugin.pytest_configure(config)[source]#

Allows plugins and conftest files to perform initial configuration.

More information: https://docs.pytest.org/en/latest/reference/reference.html#std-hook-pytest_configure

Parameters:

config (Config) – The pytest config object.

Return type:

None

ensembl.utils.plugin.pytest_report_header(config)[source]#

Presents extra information in the report header.

Parameters:

config (Config) – Access to configuration values, pluginmanager and plugin hooks.

Return type:

str

ensembl.utils.plugin.test_dbs(request, db_factory)[source]#

Returns a dictionary of unit test databases with the database name as key.

Requires a list of dictionaries, each with keys src, name and metadata, passed via request.param. At minimum either src or name needs to be provided. See db_factory() for details about each key’s value.

This fixture is a wrapper of db_factory() intended to be used via indirect parametrization, for example:

from ensembl.core.models import Base
@pytest.mark.parametrize(
    "test_dbs",
    [
        [
            {"src": "core_db"},
            {"src": "core_db", "name": "human"},
            {"src": "core_db", "name": "cat", "metadata": Base.metadata},
        ]
    ],
    indirect=True
)
def test_method(..., test_dbs: dict[str, UnitTestDB], ...):
Parameters:
  • request (FixtureRequest) – Fixture that provides information of the requesting test function.

  • db_factory (Callable) – Fixture that provides a unit test database factory.

Return type:

dict[str, UnitTestDB]

ensembl.utils.rloader module#

Allow to seamlessly load / read the content of a remote file as if it was located locally.

class ensembl.utils.rloader.RemoteFileLoader(parser=None)[source]#

Bases: object

Loads remote files, allowing specific format parsing options.

Parameters:

parser (Optional[str], default: None) – Parser to use for this object. Default: None (no format-specific parsing done).

Variables:
  • available_formats – File formats with ad-hoc parser available.

  • parser – Parser selected for this object.

available_formats: set[str] = {'env', 'ini', 'json', 'yaml'}#
parser: str | None = None#
r_open(url)[source]#

Returns the parsed remote file from the given URL.

Parameters:

url (str) – URL of the remote file to fetch.

Raises:
  • requests.exception.HTTPError – If loading or requesting the given URL returned an error.

  • requests.exception.Timeout – If a timeout was raised whilst requesting the given URL.

Return type:

Any