prepare
ensembl.io.genomio.genome_metadata.prepare
¶
Expand the genome metadata file adding information about the provider, taxonomy, and assembly and gene build versions.
PROVIDER_DATA = {'GenBank': {'assembly': {'provider_name': 'GenBank', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}, 'annotation': {'provider_name': 'GenBank', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}}, 'RefSeq': {'assembly': {'provider_name': 'RefSeq', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}, 'annotation': {'provider_name': 'RefSeq', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}}}
module-attribute
¶
MetadataError
¶
Bases: Exception
When a metadata value is not expected.
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
68 69 |
|
MissingNodeError
¶
Bases: Exception
When a taxon XML node cannot be found.
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
64 65 |
|
add_assembly_version(genome_data)
¶
Adds version number to the genome's assembly information if one is not present already.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
genome_data
|
Dict
|
Genome information of assembly, accession and annotation. |
required |
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
108 109 110 111 112 113 114 115 116 117 118 119 |
|
add_genebuild_metadata(genome_data)
¶
Adds genebuild metadata to genome information if not present already.
The default convention is to use the current date as "version"
and "start_date"
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
genome_data
|
Dict
|
Genome information of assembly, accession and annotation. |
required |
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
add_provider(genome_metadata, ncbi_data)
¶
Updates the genome metadata adding provider information for assembly and gene models.
Assembly provider metadata will only be added if it is missing, i.e. neither "provider_name"
or
"provider_url"
are present. The gene model metadata will only be added if gff3_file
is provided.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
genome_data
|
Genome information of assembly, accession and annotation. |
required | |
ncbi_data
|
Dict
|
Report data from NCBI datasets. |
required |
Raises:
Type | Description |
---|---|
MetadataError
|
If accession's format in genome metadata does not match with a known provider. |
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
add_species_metadata(genome_metadata, ncbi_data)
¶
Adds taxonomy ID, scientific name and strain (if present) from the NCBI dataset report.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
genome_metadata
|
Dict
|
Genome information of assembly, accession and annotation. |
required |
ncbi_data
|
Dict
|
Report data from NCBI datasets. |
required |
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
main()
¶
Module's entry-point.
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
prepare_genome_metadata(input_file, output_file, ncbi_meta)
¶
Updates the genome metadata JSON file with additional information.
In particular, more information is added about the provider, the assembly and its gene build version, and the taxonomy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
PathLike
|
Path to JSON file with genome metadata. |
required |
output_file
|
PathLike
|
Output directory where to generate the final |
required |
ncbi_meta
|
PathLike
|
JSON file from NCBI datasets. |
required |
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|