Run Ensembl Meta¶

Overview¶

The RUN_ENSEMBL_META process generates SQL files containing Ensembl core database metadata, including schema information, species details, and production metadata required for Ensembl release databases.

Process Details¶

Label: python
Tag: Uses genome assembly accession (meta.gca)
Publish Directory: ${params.outdir}/${meta.gca}

Inputs¶

Name	Type	Description
meta	val	Metadata map containing genome assembly and database connection details

Required Metadata Fields¶

gca: Genome assembly accession
dbname: Database name
production_name: Species production name

Outputs¶

Channel	Type	Description
ensembl_meta_output	tuple	Metadata and generated SQL files (`*.sql`)
versions_file	path	versions.yml file tracking Python version

Parameters¶

Required¶

params.outdir: Output directory for results
params.enscode: Path to Ensembl code repository
params.host: Database host
params.port: Database port
params.team: Team identifier for metadata attribution

Optional¶

params.files_latency: Delay after script execution (default handled by afterScript)

Script Details¶

The process: 1. Executes core_meta_data.py Python script from the Ensembl genes repository 2. Connects to the specified core database 3. Generates SQL files with metadata insertions/updates for: - Schema version information - Species metadata (taxonomy, assembly, etc.) - Production database references - Team attribution 4. Outputs SQL files to core_statistics/ subdirectory 5. Creates symbolic links to SQL files in the publish directory 6. Captures Python version information

Dependencies¶

Python 3
core_meta_data.py script (from ensembl-genes/src/python/ensembl/genes/metadata/)
Ensembl Python libraries
Database read access

Generated SQL Content¶

The SQL files typically include INSERT/UPDATE statements for the meta table with keys such as: - schema_version - schema_type - assembly.default - species.production_name - species.taxonomy_id - species.scientific_name

Notes¶

Results are published to a genome-specific subdirectory
The process includes a configurable sleep delay after completion to handle file system latency
Generated SQL files can be executed using the POPULATE_DB process
The core_statistics/ directory is created as an output subdirectory
Symbolic links ensure SQL files are accessible in the publish directory