Skip to content

db_map

ensembl.io.genomio.external_db.db_map

Get a mapping for external db names.

DEFAULT_EXTERNAL_DB_MAP = default_map_path module-attribute

default_map_res = files('ensembl.io.genomio.data.external_db_map').joinpath('default.txt') module-attribute

MapFormatError

Bases: ValueError

Error when parsing the db map file.

Source code in src/python/ensembl/io/genomio/external_db/db_map.py
32
33
class MapFormatError(ValueError):
    """Error when parsing the db map file."""

get_external_db_map(map_file)

Get an external_db map from a tab file without header.

Empty lines and comments (lines starting with #) are ignored. The first 2 columns are expected to be the main name, and the alternative name. Any other columns after that are ignored.

Parameters:

Name Type Description Default
map_file Path

Path to a file with external DB mapping.

required

Returns:

Type Description
dict[str, str]

Dict with keys as alternate names, and values as standard name.

Source code in src/python/ensembl/io/genomio/external_db/db_map.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def get_external_db_map(map_file: Path) -> dict[str, str]:
    """Get an external_db map from a tab file without header.

    Empty lines and comments (lines starting with #) are ignored.
    The first 2 columns are expected to be the main name, and the alternative name. Any other columns
    after that are ignored.

    Args:
        map_file: Path to a file with external DB mapping.

    Returns:
        Dict with keys as alternate names, and values as standard name.

    """
    db_map: dict[str, str] = {}
    with map_file.open("r") as map_fh:
        for line in map_fh:
            line = line.rstrip()
            if line.startswith("#") or line.startswith(" ") or line == "":
                continue
            parts = line.split("\t")
            if len(parts) < 2:
                raise MapFormatError(f"External db file is not formatted correctly for: {line}")
            (main_name, alt_name) = parts[0:2]
            db_map[alt_name] = main_name
    return db_map