cmsearch Module Documentation

Infernal and its “cmsearch” tool are used for detecting sncRNAs in sequence databases. sncRNA diversity: Small non-coding RNAs (sncRNAs) constitute a diverse group of RNA molecules that play critical roles in various cellular processes, including gene regulation, RNA interference, and post-transcriptional modifications. There are different types of sncRNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), and small nucleolar RNAs (snoRNAs). Despite their small size, many sncRNAs exhibit conserved structural features or sequence motifs across species, essential to identify and study them. Covariance models (CMs) can represent conserved RNA secondary structures as well as conserved sequence patterns. This makes them well-suited for detecting sncRNAs in sequence databases.

Nawrocki, E. P., Kolbe, D. L., & Eddy, S. R. (2009). Infernal 1.0: inference of RNA alignments. Bioinformatics, 25(10), 1335-1337.

ensembl.tools.anno.snc_rna_annotation.cmsearch.run_cmsearch(genome_file: PathLike, output_dir: Path, rfam_accession_file: Path, rfam_cm_db: Path = PosixPath('/hps/nobackup/flicek/ensembl/genebuild/blastdb/ncrna/Rfam_14.0/Rfam.cm'), rfam_seeds_file: Path = PosixPath('/hps/nobackup/flicek/ensembl/genebuild/blastdb/ncrna/Rfam_14.0/Rfam.seed'), cmsearch_bin: Path = PosixPath('cmsearch'), rnafold_bin: Path = PosixPath('RNAfold'), num_threads: int = 1) None[source]
Search CM(s) against a Rfam database
param genome_file:

Genome file path.

type genome_file:

PathLike

param output_dir:

Working directory path.

type output_dir:

Path

param rfam_accessions:

List of Rfam accessions.

type rfam_accessions:

Path

param rfam_cm_db:

Rfam database with cm models.

type rfam_cm_db:

Path

param rfam_seed:

Rfam seeds file.

type rfam_seed:

Path

param cmsearch_bin:

cmsearch software path.

type cmsearch_bin:

Path

param rnafold_bin:

RNAfold software path.

type rnafold_bin:

Path

param num_threads:

Number of threads.

type num_threads:

int

return:

None

rtype:

None