BUSCO_GENOME_LINEAGE Module¶

Overview¶

The BUSCO_GENOME_LINEAGE module assesses genome assembly completeness by running BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis in genome mode. It quantifies genome quality by searching for expected single-copy orthologs from a specific lineage dataset, providing metrics on complete, fragmented, duplicated, and missing BUSCOs.

Module Location: pipelines/statistics/modules/busco_genome_lineage.nf

Functionality¶

This module performs comprehensive genome quality assessment:

Genome Completeness Analysis: Searches genomic sequences for universal single-copy orthologs
Gene Prediction: Uses Augustus or Metaeuk to predict genes within the genome
Ortholog Mapping: Compares predictions against lineage-specific BUSCO datasets
Quality Metrics: Generates statistics on complete (C), complete single-copy (S), complete duplicated (D), fragmented (F), and missing (M) BUSCOs
Offline Mode: Uses pre-downloaded BUSCO datasets from params.download_path

The module provides standardized, quantitative quality metrics used to assess genome assembly and annotation completeness.

Inputs¶

Channel Input¶

tuple val(meta), path(genome_file)

Metadata Map:

[
    gca: String,                     // Genome assembly accession
    busco_dataset: String,           // REQUIRED: BUSCO lineage (e.g., "primates_odb10")
    dbname: String,                  // Database name
    species_id: Integer,             // Species ID
    production_species: String       // Production name
]

Genome File: - Format: FASTA (.fa, .fasta, .fna) - Content: Genomic DNA sequences (can be scaffolds, contigs, or chromosomes) - Compression: Can be gzipped (.gz) - Size: Typically 10 MB - 5 GB

Parameters¶

Parameter	Type	Default	Description
`params.download_path`	String	Required	Path to pre-downloaded BUSCO lineage datasets
`params.outdir`	String	Required	Output directory for results
`params.files_latency`	Integer	`5`	File system latency wait time (seconds)

Outputs¶

Published Outputs¶

Directory: ${params.outdir}/${meta.gca}/

1. Summary Files¶

File	Description	Format
`short_summary.specific.*.txt`	Main BUSCO summary with key metrics	Plain text
`busco_genome_batch_summary.txt`	Full batch summary (verbose)	Plain text

2. Versions File¶

File	Description
`versions_busco_genome.yml`	Software versions used in analysis

Channel Outputs¶

Channel	Type	Description
`busco_genome_short_summary_output`	`tuple val(meta), path("short_summary.*.txt")`	Key summary file
`busco_genome_batch_summary_output`	`tuple val(meta), path("busco_genome_batch_summary.txt")`	Verbose batch summary
`versions_file`	`path("versions_busco_genome.yml")`	Versions tracking file

Output File Contents¶

short_summary.specific.{lineage}.{assembly}.txt¶

Example:

# BUSCO version is: 5.4.3 
# The lineage dataset is: primates_odb10 (Creation date: 2024-01-16, number of genomes: 40, number of BUSCOs: 13780)
# Summarized benchmarking in BUSCO notation for file GCA_000001405.29_GRCh38.p14_genomic.fna.gz
# BUSCO was run in mode: genome

    ***** Results: *****

    C:98.8%[S:96.5%,D:2.3%],F:0.6%,M:0.6%,n:13780      

    13608   Complete BUSCOs (C)            
    13295   Complete and single-copy BUSCOs (S)    
    313 Complete and duplicated BUSCOs (D)     
    83  Fragmented BUSCOs (F)              
    89  Missing BUSCOs (M)             
    13780   Total BUSCO groups searched        

# Dependencies and versions:
#   hmmer: 3.3.2
#   metaeuk: 6.a5d39d9

Key Metrics Explained: - C (Complete): BUSCOs found with expected length (> 95% of expected length) - S (Single-copy): Complete BUSCOs found exactly once - D (Duplicated): Complete BUSCOs found more than once - F (Fragmented): BUSCOs found but shorter than expected - M (Missing): BUSCOs not found at all

busco_genome_batch_summary.txt¶

Example:

# Summarized BUSCO benchmarking for GCA_000001405.29_GRCh38.p14_genomic.fna.gz
Input_file  Dataset Complete    Single  Duplicated  Fragmented  Missing n_markers   Scores-cutoff
GCA_000001405.29_GRCh38.p14_genomic.fna.gz  primates_odb10  98.8    96.5    2.3 0.6 0.6 13780   default

Columns: 1. Input_file: Genome file analyzed 2. Dataset: BUSCO lineage used 3. Complete: % complete BUSCOs (C) 4. Single: % single-copy BUSCOs (S) 5. Duplicated: % duplicated BUSCOs (D) 6. Fragmented: % fragmented BUSCOs (F) 7. Missing: % missing BUSCOs (M) 8. n_markers: Total BUSCO groups in dataset 9. Scores-cutoff: Detection threshold used

versions_busco_genome.yml¶

Format: YAML

Content Example:

"BUSCO_GENOME_LINEAGE":
  busco: 5.4.3
  python: 3.11.0
  hmmer: 3.3.2
  metaeuk: 6.a5d39d9
  sepp: 4.5.1
  prodigal: 2.6.3

Process Configuration¶

Directives¶

label 'busco'                                   // Use BUSCO resource allocation
tag "${meta.gca}"                               // Tag with GCA accession
publishDir "${params.outdir}/${meta.gca}"       // Publish to species-specific directory
afterScript "sleep ${params.files_latency}"     // Wait for file system sync
maxForks 10                                     // Limit concurrent executions

Resource Allocation¶

From nextflow.config (busco label):

CPUs: 8 cores
Memory: 32 GB
Time: 24 hours
Queue: Standard

Container¶

ezlabgva/busco:v5.4.3_cv1

Installed Tools: - BUSCO 5.4.3 - HMMER 3.3.2 - Metaeuk 6.a5d39d9 - Augustus 3.5.0 - Prodigal 2.6.3 - SEPP 4.5.1 - Python 3.11

Implementation Details¶

BUSCO Command¶

The core BUSCO execution:

busco \
    -f \
    --offline \
    --in ${genome_file} \
    --out ${meta.gca}_busco_genome \
    --mode genome \
    --lineage_dataset ${meta.busco_dataset} \
    --download_path ${params.download_path} \
    --cpu ${task.cpus}

Parameter Breakdown:

Flag	Purpose
`-f`	Force overwrite of existing results
`--offline`	Use local datasets (no downloads)
`--in`	Input genome file
`--out`	Output directory name
`--mode genome`	Genome analysis mode (vs. proteins/transcriptome)
`--lineage_dataset`	BUSCO lineage to use (e.g., `primates_odb10`)
`--download_path`	Path to pre-downloaded lineages
`--cpu`	Number of CPU cores to use

Offline Mode¶

The module uses --offline to avoid downloading datasets during execution:

Requirements: 1. Datasets must be pre-downloaded to params.download_path 2. Directory structure must match BUSCO expectations:

${params.download_path}/
├── lineages/
│   ├── primates_odb10/
│   │   ├── ancestral
│   │   ├── dataset.cfg
│   │   ├── hmms/
│   │   ├── prfl/
│   │   └── scores_cutoff
│   ├── vertebrata_odb10/
│   └── bacteria_odb10/
└── placement_files/

Pre-downloading Datasets:

# Download specific lineage
busco --download lineage primates_odb10 --download_path /path/to/busco_data

# Download all datasets
busco --list-datasets
busco --download all --download_path /path/to/busco_data

Gene Prediction¶

BUSCO performs gene prediction using:

Metaeuk (default for eukaryotes):
Profile-based gene prediction
Better sensitivity for divergent sequences
Used for eukaryotic lineages
Prodigal (for prokaryotes):
Fast ab initio gene prediction
Used for bacterial/archaeal lineages
Augustus (optional):
Can be specified with --augustus flag
Species-specific training available

Output Organization¶

BUSCO creates a nested output structure:

${meta.gca}_busco_genome/
├── short_summary.specific.primates_odb10.${meta.gca}_busco_genome.txt  ← Published
├── full_table.tsv  ← Detailed per-BUSCO results (not published)
├── missing_busco_list.tsv  ← List of missing BUSCOs (not published)
├── single_copy_busco_sequences/  ← Individual gene sequences (not published)
├── busco_sequences/
│   ├── fragmented_busco_sequences/
│   └── multi_copy_busco_sequences/
└── run_primates_odb10/  ← Internal BUSCO files (not published)

Published: Only summary files are published to save space

Post-processing¶

The module collects and renames summary files:

# Move short summary
find ./${meta.gca}_busco_genome/ -name "short_summary.*.txt" \
    -exec mv {} . \;

# Copy batch summary
cp ./${meta.gca}_busco_genome/batch_summary.txt \
    busco_genome_batch_summary.txt

Usage Example¶

In a Workflow¶

include { BUSCO_DATASET } from '../modules/busco_dataset.nf'
include { FETCH_GENOME } from '../modules/fetch_genome.nf'
include { BUSCO_GENOME_LINEAGE } from '../modules/busco_genome_lineage.nf'

workflow {
    // Get metadata and select dataset
    def meta_ch = channel.of([
        gca: 'GCA_000001405.29',
        dbname: 'homo_sapiens_core_110_38',
        species_id: 1,
        production_species: 'homo_sapiens',
        taxon_id: '9606'
    ])

    // Select appropriate BUSCO dataset
    def dataset_ch = BUSCO_DATASET(meta_ch).busco_dataset_output
        .map { meta, stdout ->
            meta + [busco_dataset: stdout.text.trim()]
        }

    // Fetch genome file
    def genome_ch = FETCH_GENOME(dataset_ch).genome_file_output

    // Run BUSCO analysis
    BUSCO_GENOME_LINEAGE(genome_ch)

    // View results
    BUSCO_GENOME_LINEAGE.out.busco_genome_short_summary_output
        .view { meta, summary ->
            "BUSCO genome analysis for ${meta.gca}: ${summary}"
        }
}

Configuration¶

nextflow.config:

params {
    // BUSCO configuration
    download_path = '/data/busco_datasets'
    outdir = '/output/busco_results'
    files_latency = 5

    // Resource allocation
    max_cpus = 8
    max_memory = '32.GB'
    max_time = '24.h'
}

process {
    withLabel: busco {
        cpus = 8
        memory = 32.GB
        time = 24.h
    }
}

Expected Output Files¶

Published to: ${params.outdir}/GCA_000001405.29/

short_summary.specific.primates_odb10.GCA_000001405.29_busco_genome.txt
busco_genome_batch_summary.txt
versions_busco_genome.yml

Quality Interpretation¶

High-Quality Genome Assembly¶

Metrics: - Complete (C): ≥ 95% - Single-copy (S): ≥ 90% - Duplicated (D): < 5% - Fragmented (F): < 3% - Missing (M): < 2%

Example (Human GRCh38):

C:98.8%[S:96.5%,D:2.3%],F:0.6%,M:0.6%,n:13780

Interpretation: - ✅ Excellent completeness (98.8%) - ✅ High single-copy rate (96.5%) - ✅ Low duplication (2.3% - expected for mammals) - ✅ Minimal fragmentation (0.6%) - ✅ Very few missing BUSCOs (0.6%)

Assessment: High-quality assembly suitable for annotation and analysis

Good-Quality Genome Assembly¶

Metrics: - Complete (C): 85-95% - Single-copy (S): 80-90% - Duplicated (D): 5-15% - Fragmented (F): 3-10% - Missing (M): 2-10%

Example:

C:90.5%[S:84.2%,D:6.3%],F:5.2%,M:4.3%,n:10000

Interpretation: - ✅ Good completeness (90.5%) - ⚠️ Moderate single-copy rate (84.2%) - ⚠️ Some duplication (6.3%) - ⚠️ Moderate fragmentation (5.2%) - ⚠️ Some missing BUSCOs (4.3%)

Assessment: Usable assembly, may need polishing or gap filling

Poor-Quality Genome Assembly¶

Metrics: - Complete (C): < 85% - Single-copy (S): < 80% - Fragmented (F): > 10% - Missing (M): > 10%

Example:

C:72.8%[S:68.1%,D:4.7%],F:12.5%,M:14.7%,n:10000

Interpretation: - ❌ Low completeness (72.8%) - ❌ Poor single-copy rate (68.1%) - ❌ High fragmentation (12.5%) - ❌ Many missing BUSCOs (14.7%)

Assessment: Poor assembly quality - significant contamination, fragmentation, or incomplete sequencing

Recommendations: 1. Re-sequence with higher coverage 2. Use better assembly algorithms 3. Perform contamination screening 4. Increase sequencing read length

Duplication Patterns¶

Normal Duplication (D: 2-5%): - Reflects recent whole-genome duplications - Common in vertebrates, plants - Expected biological feature

High Duplication (D: > 10%): - May indicate: - Recent whole-genome duplication (polyploidy) - Haplotypic duplication (phased assembly issue) - Contamination - Misassembly

Very High Duplication (D: > 20%): - Likely assembly issues: - Failed haplotype separation - Multiple individuals sequenced - Severe contamination

Error Handling¶

Common Errors¶

1. Missing Lineage Dataset¶

Error Message:

ERROR: Unable to find lineage 'primates_odb10' in /path/to/busco_data

Cause: Lineage not downloaded or incorrect path

Solution:

# Download missing lineage
busco --download lineage primates_odb10 --download_path ${params.download_path}

# Or verify path
ls ${params.download_path}/lineages/primates_odb10/

2. Insufficient Memory¶

Error Message:

ERROR: Process 'BUSCO_GENOME_LINEAGE' terminated with an error exit status (137) -- Execution is retried (1)

Cause: Out of memory (exit code 137)

Solution (in nextflow.config):

process {
    withLabel: busco {
        memory = 64.GB  // Increase from 32.GB
    }
}

3. Corrupt Genome File¶

Error Message:

ERROR: FASTA parser error: Unexpected character in header

Solution: - Validate FASTA format:

# Check for non-standard characters
grep ">" genome.fa | head

# Validate with SeqKit
seqkit stats genome.fa

- Fix headers if needed:

sed 's/[^a-zA-Z0-9_>.-]/_/g' genome.fa > genome_clean.fa

4. Empty Output Files¶

Error Message:

ERROR: Process requirement not met - No summary files produced

Cause: BUSCO failed silently

Solution: - Check BUSCO logs:

cat ${meta.gca}_busco_genome/logs/busco.log

- Verify input file:

head genome.fa
wc -l genome.fa

- Test BUSCO manually:

busco -i genome.fa -o test_run -m genome -l primates_odb10 --offline

5. CPU Resource Contention¶

Error Message:

WARNING: BUSCO analysis slower than expected

Cause: Too many concurrent BUSCO processes

Solution (adjust maxForks):

process {
    withName: BUSCO_GENOME_LINEAGE {
        maxForks = 5  // Reduce from 10
    }
}

6. File System Latency Issues¶

Symptom: Missing output files immediately after process completion

Solution: - Increase params.files_latency:

params.files_latency = 10  // Increase from 5

- Use stageOutMode:

process {
    stageOutMode = 'copy'  // Force immediate copy
}

Version Tracking¶

The module captures comprehensive version information:

"BUSCO_GENOME_LINEAGE":
  busco: 5.4.3
  python: 3.11.0
  hmmer: 3.3.2
  metaeuk: 6.a5d39d9
  sepp: 4.5.1
  prodigal: 2.6.3

Version Extraction:

# BUSCO version
busco --version | sed 's/BUSCO //'

# Python version
python --version | awk '{print $2}'

# HMMER version
hmmsearch -h | grep "# HMMER" | sed 's/# HMMER //' | awk '{print $1}'

Integration with Other Modules¶

Upstream Modules¶

BUSCO_DATASET: Provides busco_dataset for lineage selection
FETCH_GENOME: Provides genome file for analysis
DB_METADATA: Provides species metadata

Downstream Modules¶

BUSCO_CORE_METAKEYS: Stores BUSCO metrics in database
GENERATE_STATS: Aggregates quality metrics

Data Flow Diagram¶

graph LR
    A[DB_METADATA] --> B[BUSCO_DATASET]
    B --> C[FETCH_GENOME]
    C --> D[BUSCO_GENOME_LINEAGE]
    D --> E[BUSCO_CORE_METAKEYS]
    D --> F[GENERATE_STATS]

    style D fill:#FFD700

Performance Considerations¶

Execution Time¶

Typical execution times:

Genome Size	CPU Cores	Time
100 MB (bacterial)	8	10-30 min
500 MB (fungal)	8	30 min - 2 hours
1 GB (insect)	8	1-3 hours
3 GB (human)	8	4-12 hours
10 GB (plant)	8	12-24 hours

Factors affecting performance: - Genome size and complexity - Number of BUSCOs in lineage dataset - CPU cores allocated - Gene prediction method (Metaeuk vs. Augustus) - Disk I/O speed

Resource Optimization¶

For Small Genomes (< 500 MB)¶

process {
    withName: BUSCO_GENOME_LINEAGE {
        cpus = 4
        memory = 16.GB
        time = 4.h
    }
}

For Large Genomes (> 5 GB)¶

process {
    withName: BUSCO_GENOME_LINEAGE {
        cpus = 16
        memory = 64.GB
        time = 48.h
    }
}

Parallelization Strategy¶

The maxForks 10 directive limits concurrent BUSCO analyses to prevent resource exhaustion:

Example: Analyzing 100 genomes - Without maxForks: All 100 start simultaneously → cluster overload - With maxForks 10: Only 10 run concurrently → optimal resource usage

Tuning maxForks:

// Conservative (resource-limited clusters)
maxForks 5

// Moderate (balanced)
maxForks 10

// Aggressive (large clusters)
maxForks 20

Disk Space Requirements¶

Per-genome space usage: - Input genome: 0.1-5 GB - BUSCO output: 0.5-10 GB (depending on genome size) - Published summaries: < 1 MB

Recommendation: Ensure workDir has 50-100 GB per concurrent BUSCO process

Advanced Features¶

Custom BUSCO Parameters¶

Add custom BUSCO flags:

process BUSCO_GENOME_LINEAGE {
    script:
    """
    busco \
        -f \
        --offline \
        --in ${genome_file} \
        --out ${meta.gca}_busco_genome \
        --mode genome \
        --lineage_dataset ${meta.busco_dataset} \
        --download_path ${params.download_path} \
        --cpu ${task.cpus} \
        --augustus \
        --augustus_species ${meta.augustus_species} \
        --long \
        --limit 3
    """
}

Additional Flags: - --augustus: Use Augustus instead of Metaeuk - --augustus_species: Species for Augustus training - --long: Optimize for long read assemblies - --limit: Max number of regions per BUSCO

Multiple Lineage Analysis¶

Run BUSCO with multiple lineages:

workflow {
    def meta_ch = channel.of([
        gca: 'GCA_000001405.29',
        busco_datasets: ['primates_odb10', 'mammalia_odb10', 'vertebrata_odb10']
    ])

    def genome_ch = FETCH_GENOME(meta_ch)

    // Explode to multiple lineages
    genome_ch.flatMap { meta, genome ->
        meta.busco_datasets.collect { dataset ->
            [meta + [busco_dataset: dataset], genome]
        }
    }
    | BUSCO_GENOME_LINEAGE
}

Comparative Analysis¶

Compare results across lineages:

BUSCO_GENOME_LINEAGE.out.busco_genome_batch_summary_output
    .collectFile(name: 'combined_busco_results.txt', 
                 keepHeader: true,
                 storeDir: "${params.outdir}/")

Result: Single file with all BUSCO results for comparison

Testing¶

Unit Test¶

Test BUSCO analysis on small genome:

# Test with E. coli genome
nextflow run pipelines/statistics/main.nf \
    --run_busco_core \
    --csvFile test_data/test_busco_genome.csv \
    --download_path /data/busco_datasets \
    --outdir test_results \
    --mysqlUrl "mysql://ensembldb.ensembl.org:3306/" \
    --mysqluser "anonymous" \
    -entry BUSCO \
    --max_cpus 8 \
    -process.executor 'local'

test_busco_genome.csv:

dbname,species_id,busco_dataset,genome_file,protein_file
escherichia_coli_core_110_1,1,bacteria_odb10,test_data/ecoli.fa,

Expected Test Output¶

Console Log:

[BUSCO_GENOME_LINEAGE] Running BUSCO for GCA_000005845.2
[BUSCO] 2024-02-06 12:00:00 INFO: Starting BUSCO in genome mode
[BUSCO] 2024-02-06 12:15:23 INFO: Results written to GCA_000005845.2_busco_genome
[BUSCO] 2024-02-06 12:15:24 INFO: BUSCO analysis complete

Output Files:

test_results/GCA_000005845.2/
├── short_summary.specific.bacteria_odb10.GCA_000005845.2_busco_genome.txt
├── busco_genome_batch_summary.txt
└── versions_busco_genome.yml

Expected Metrics (E. coli):

C:98.5%[S:98.3%,D:0.2%],F:0.8%,M:0.7%,n:124

Validation¶

Compare results with expected ranges:

# Extract completeness percentage
COMPLETE=$(grep "C:" short_summary.*.txt | sed 's/.*C:\([0-9.]*\)%.*/\1/')

# Validate
if (( $(echo "$COMPLETE >= 95" | bc -l) )); then
    echo "✅ PASS: Completeness ${COMPLETE}% (expected >= 95%)"
else
    echo "❌ FAIL: Completeness ${COMPLETE}% (expected >= 95%)"
fi

Best Practices¶

1. Use Appropriate Lineages¶

Best Practice: Always use the most specific lineage available

// GOOD - specific lineage
meta.busco_dataset = "primates_odb10"  // For human

// AVOID - too general
meta.busco_dataset = "eukaryota_odb10"  // For human (too broad)

2. Pre-download All Datasets¶

Setup:

# Download all required lineages before running pipeline
busco --list-datasets
busco --download all --download_path /data/busco_datasets

3. Monitor Resource Usage¶

Track memory and CPU:

process {
    withLabel: busco {
        cpus = 8
        memory = { 32.GB * task.attempt }  // Double on retry
        time = { 24.h * task.attempt }
        maxRetries = 2
        errorStrategy = 'retry'
    }
}

4. Handle Failed Analyses¶

process {
    withName: BUSCO_GENOME_LINEAGE {
        errorStrategy = { task.exitStatus in [137, 140] ? 'retry' : 'ignore' }
        maxRetries = 2
    }
}

Exit codes: - 137: Out of memory - 140: Timeout - Other: Skip and continue

5. Aggregate Results¶

Collect all BUSCO results for downstream analysis:

workflow {
    BUSCO_GENOME_LINEAGE(genome_ch)

    // Collect all summaries
    BUSCO_GENOME_LINEAGE.out.busco_genome_short_summary_output
        .collectFile(name: 'all_busco_genomes.txt', 
                     storeDir: "${params.outdir}/summary/")
}

Troubleshooting¶

Debug Mode¶

Enable verbose BUSCO logging:

export BUSCO_CONFIG_FILE=/path/to/config.ini

# config.ini
[busco]
log_level = DEBUG

Check BUSCO Logs¶

Inspect detailed logs:

# View main log
cat ${meta.gca}_busco_genome/logs/busco.log

# Check HMMER logs
cat ${meta.gca}_busco_genome/logs/hmmsearch_*.log

# Check Metaeuk logs
cat ${meta.gca}_busco_genome/logs/metaeuk_*.log

Manual Execution¶

Test BUSCO manually:

# Run BUSCO directly
busco \
    -i genome.fa \
    -o test_busco \
    -m genome \
    -l primates_odb10 \
    --offline \
    --download_path /data/busco_datasets \
    --cpu 8 \
    -f

# Check results
cat test_busco/short_summary.*.txt

Compare with nf-core/busco¶

Validate results against nf-core implementation:

nextflow run nf-core/busco \
    --input genome.fa \
    --mode genome \
    --lineage primates_odb10 \
    --busco_lineages_path /data/busco_datasets \
    --outdir compare_results

BUSCO_DATASET Module - Dataset selection
BUSCO_PROTEIN_LINEAGE Module - Protein-based analysis
FETCH_GENOME Module - Genome file retrieval
BUSCO Workflow - Complete workflow

References¶

Last Updated: 2026-02-06 23:58:12
Module Version: 1.0.0
Maintained By: Ensembl Genes Team

BUSCO_GENOME_LINEAGE Module¶

Overview¶

Functionality¶

Inputs¶

Channel Input¶

Parameters¶

Outputs¶

Published Outputs¶

1. Summary Files¶

2. Versions File¶

Channel Outputs¶

Output File Contents¶

short_summary.specific.{lineage}.{assembly}.txt¶

busco_genome_batch_summary.txt¶

versions_busco_genome.yml¶

Process Configuration¶

Directives¶

Resource Allocation¶

Container¶

Implementation Details¶

BUSCO Command¶

Offline Mode¶

Gene Prediction¶

Output Organization¶

Post-processing¶

Usage Example¶

In a Workflow¶

Configuration¶

Expected Output Files¶

Quality Interpretation¶

High-Quality Genome Assembly¶

Good-Quality Genome Assembly¶

Poor-Quality Genome Assembly¶

Duplication Patterns¶

Error Handling¶

Common Errors¶

1. Missing Lineage Dataset¶

2. Insufficient Memory¶

3. Corrupt Genome File¶

4. Empty Output Files¶

5. CPU Resource Contention¶

6. File System Latency Issues¶

Version Tracking¶

Integration with Other Modules¶

Upstream Modules¶

Downstream Modules¶

Data Flow Diagram¶

Performance Considerations¶

Execution Time¶

Resource Optimization¶

For Small Genomes (< 500 MB)¶

For Large Genomes (> 5 GB)¶

Parallelization Strategy¶

Disk Space Requirements¶

Advanced Features¶

Custom BUSCO Parameters¶

Multiple Lineage Analysis¶

Comparative Analysis¶

Testing¶

Unit Test¶

Expected Test Output¶

Validation¶

Best Practices¶

1. Use Appropriate Lineages¶

2. Pre-download All Datasets¶

3. Monitor Resource Usage¶

4. Handle Failed Analyses¶

5. Aggregate Results¶

Troubleshooting¶

Debug Mode¶

Check BUSCO Logs¶

Manual Execution¶

Compare with nf-core/busco¶

Related Documentation¶

References¶