prepare
ensembl.io.genomio.genome_metadata.prepare
¶
    Expand the genome metadata file adding information about the provider, taxonomy, and assembly and gene build versions.
PROVIDER_DATA = {'GenBank': {'assembly': {'provider_name': 'GenBank', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}, 'annotation': {'provider_name': 'GenBank', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}}, 'RefSeq': {'assembly': {'provider_name': 'RefSeq', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}, 'annotation': {'provider_name': 'RefSeq', 'provider_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome'}}}
  
      module-attribute
  
¶
    
MetadataError
¶
    
              Bases: Exception
When a metadata value is not expected.
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
                | 68 69 |  | 
MissingNodeError
¶
    
              Bases: Exception
When a taxon XML node cannot be found.
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
                | 64 65 |  | 
add_assembly_version(genome_data)
¶
    Adds version number to the genome's assembly information if one is not present already.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| genome_data | Dict | Genome information of assembly, accession and annotation. | required | 
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
              | 108 109 110 111 112 113 114 115 116 117 118 119 |  | 
add_genebuild_metadata(genome_data)
¶
    Adds genebuild metadata to genome information if not present already.
The default convention is to use the current date as "version" and "start_date".
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| genome_data | Dict | Genome information of assembly, accession and annotation. | required | 
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
              | 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |  | 
add_provider(genome_metadata, ncbi_data)
¶
    Updates the genome metadata adding provider information for assembly and gene models.
Assembly provider metadata will only be added if it is missing, i.e. neither "provider_name" or
"provider_url" are present. The gene model metadata will only be added if gff3_file is provided.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| genome_data | Genome information of assembly, accession and annotation. | required | |
| ncbi_data | Dict | Report data from NCBI datasets. | required | 
Raises:
| Type | Description | 
|---|---|
| MetadataError | If accession's format in genome metadata does not match with a known provider. | 
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
              | 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |  | 
add_species_metadata(genome_metadata, ncbi_data)
¶
    Adds taxonomy ID, scientific name and strain (if present) from the NCBI dataset report.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| genome_metadata | Dict | Genome information of assembly, accession and annotation. | required | 
| ncbi_data | Dict | Report data from NCBI datasets. | required | 
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
              | 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |  | 
main()
¶
    Module's entry-point.
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
              | 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |  | 
prepare_genome_metadata(input_file, output_file, ncbi_meta)
¶
    Updates the genome metadata JSON file with additional information.
In particular, more information is added about the provider, the assembly and its gene build version, and the taxonomy.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| input_file | PathLike | Path to JSON file with genome metadata. | required | 
| output_file | PathLike | Output directory where to generate the final  | required | 
| ncbi_meta | PathLike | JSON file from NCBI datasets. | required | 
Source code in src/python/ensembl/io/genomio/genome_metadata/prepare.py
              | 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |  |