Skip to content

BUSCO Core Metakeys

Overview

The BUSCO_CORE_METAKEYS process patches BUSCO metadata into an Ensembl core database by parsing BUSCO summary files and inserting metakeys directly into the database.

Process Details

  • Label: python
  • Tag: Uses genome assembly accession (meta.gca)
  • Publish Directory: ${params.outdir}/${meta.gca}
  • Conditional Execution: Only runs when params.apply_busco_metakeys is true

Inputs

Name Type Description
meta val Metadata map containing genome assembly info and database details
summary_file path BUSCO summary file to parse for metakeys

Required Metadata Fields

  • gca: Genome assembly accession
  • dbname: Database name
  • species_id: Species identifier in the database

Outputs

Channel Type Description
versions_file path Optional versions.yml file tracking Python version

Parameters

Required

  • params.apply_busco_metakeys: Boolean flag to enable/disable process execution
  • params.outdir: Output directory for results
  • params.host: Database host
  • params.port: Database port
  • params.user: Database username
  • params.password: Database password

Optional

  • params.files_latency: Delay after script execution (default handled by afterScript)

Script Details

The process: 1. Executes busco_metakeys_patch.py with database connection parameters 2. Parses the BUSCO summary file 3. Inserts metakeys into the specified Ensembl core database 4. Runs the query directly against the database (-run_query true) 5. Generates a versions file tracking the Python version used

Dependencies

  • Python 3
  • busco_metakeys_patch.py script (from Ensembl genes repository)
  • Database access credentials

Notes

  • The process includes a configurable sleep delay after completion to handle file system latency
  • Results are published to a genome-specific subdirectory
  • Direct database modification requires appropriate write permissions
  • The versions file is marked as optional