BUSCO Core Metakeys¶
Overview¶
The BUSCO_CORE_METAKEYS process patches BUSCO metadata into an Ensembl core database by parsing BUSCO summary files and inserting metakeys directly into the database.
Process Details¶
- Label:
python - Tag: Uses genome assembly accession (
meta.gca) - Publish Directory:
${params.outdir}/${meta.gca} - Conditional Execution: Only runs when
params.apply_busco_metakeysis true
Inputs¶
| Name | Type | Description |
|---|---|---|
| meta | val | Metadata map containing genome assembly info and database details |
| summary_file | path | BUSCO summary file to parse for metakeys |
Required Metadata Fields¶
gca: Genome assembly accessiondbname: Database namespecies_id: Species identifier in the database
Outputs¶
| Channel | Type | Description |
|---|---|---|
| versions_file | path | Optional versions.yml file tracking Python version |
Parameters¶
Required¶
params.apply_busco_metakeys: Boolean flag to enable/disable process executionparams.outdir: Output directory for resultsparams.host: Database hostparams.port: Database portparams.user: Database usernameparams.password: Database password
Optional¶
params.files_latency: Delay after script execution (default handled by afterScript)
Script Details¶
The process:
1. Executes busco_metakeys_patch.py with database connection parameters
2. Parses the BUSCO summary file
3. Inserts metakeys into the specified Ensembl core database
4. Runs the query directly against the database (-run_query true)
5. Generates a versions file tracking the Python version used
Dependencies¶
- Python 3
busco_metakeys_patch.pyscript (from Ensembl genes repository)- Database access credentials
Notes¶
- The process includes a configurable sleep delay after completion to handle file system latency
- Results are published to a genome-specific subdirectory
- Direct database modification requires appropriate write permissions
- The versions file is marked as optional