id_allocator
ensembl.io.genomio.gff3.id_allocator
¶
Check and allocate IDs for gene features in a GFF3 file.
InvalidStableID
¶
Bases: ValueError
Raised when there is a problem with an stable ID.
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
27 28 |
|
StableIDAllocator
dataclass
¶
Set of tools to check and allocate stable IDs.
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
current_id_number = 0
class-attribute
instance-attribute
¶
make_missing_stable_ids = True
class-attribute
instance-attribute
¶
min_id_length = 7
class-attribute
instance-attribute
¶
prefix = 'TMP_'
class-attribute
instance-attribute
¶
skip_gene_id_validation = False
class-attribute
instance-attribute
¶
generate_gene_id()
¶
Returns a new unique gene stable_id with a prefix.
The ID is made up of a prefix and a number, which is auto incremented.
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
53 54 55 56 57 58 59 60 61 |
|
generate_transcript_id(gene_id, number)
staticmethod
¶
Returns a formatted transcript ID generated from a gene ID and number. Args: gene_id: Gene stable ID. number: Positive number. Raises: ValueError: If the number provided is not greater than zero.
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
is_valid(stable_id)
¶
Checks that the format of a stable ID is valid. Args: stable_id: Stable ID to validate.
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
normalize_cds_id(cds_id)
¶
Returns a normalized version of the provided CDS ID.
The normalisation implies to remove any unnecessary prefixes around the CDS ID. However, if the CDS ID is still not proper, an empty string will be returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cds_id
|
str
|
CDS ID to normalize. |
required |
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
normalize_gene_id(gene, refseq=False)
¶
Returns a normalized gene stable ID.
Removes any unnecessary prefixes, but will generate a new stable ID if the normalized one is not recognized as valid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene
|
GFFSeqFeature
|
Gene feature to normalize. |
required |
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
normalize_pseudogene_cds_id(pseudogene)
¶
Normalizes every CDS ID of the provided pseudogene.
Ensure each CDS from a pseudogene has a proper ID: - Different from the gene - Derived from the gene if it is not proper
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pseudogene
|
GFFSeqFeature
|
Pseudogene feature. |
required |
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
remove_prefix(stable_id, prefixes)
staticmethod
¶
Returns the stable ID after removing its prefix (if any).
If more than one prefix may be found, only the first one is removed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stable_id
|
str
|
Stable ID to process. |
required |
prefixes
|
List[str]
|
List of prefixes to search for. |
required |
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
set_prefix(genome)
¶
Sets the ID prefix using the organism abbrev if it exists in the genome metadata.
Source code in src/python/ensembl/io/genomio/gff3/id_allocator.py
43 44 45 46 47 48 49 50 51 |
|