cmsearch Module Documentation¶
Infernal and its “cmsearch” tool are used for detecting sncRNAs in sequence databases. sncRNA diversity: Small non-coding RNAs (sncRNAs) constitute a diverse group of RNA molecules that play critical roles in various cellular processes, including gene regulation, RNA interference, and post-transcriptional modifications. There are different types of sncRNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), and small nucleolar RNAs (snoRNAs). Despite their small size, many sncRNAs exhibit conserved structural features or sequence motifs across species, essential to identify and study them. Covariance models (CMs) can represent conserved RNA secondary structures as well as conserved sequence patterns. This makes them well-suited for detecting sncRNAs in sequence databases.
Nawrocki, E. P., Kolbe, D. L., & Eddy, S. R. (2009). Infernal 1.0: inference of RNA alignments. Bioinformatics, 25(10), 1335-1337.
- ensembl.tools.anno.snc_rna_annotation.cmsearch.run_cmsearch(genome_file: PathLike, output_dir: Path, rfam_accession_file: Path, rfam_cm_db: Path = PosixPath('/hps/nobackup/flicek/ensembl/genebuild/blastdb/ncrna/Rfam_14.0/Rfam.cm'), rfam_seeds_file: Path = PosixPath('/hps/nobackup/flicek/ensembl/genebuild/blastdb/ncrna/Rfam_14.0/Rfam.seed'), cmsearch_bin: Path = PosixPath('cmsearch'), rnafold_bin: Path = PosixPath('RNAfold'), num_threads: int = 1) None [source]¶
- Search CM(s) against a Rfam database
- param genome_file:
Genome file path.
- type genome_file:
PathLike
- param output_dir:
Working directory path.
- type output_dir:
Path
- param rfam_accessions:
List of Rfam accessions.
- type rfam_accessions:
Path
- param rfam_cm_db:
Rfam database with cm models.
- type rfam_cm_db:
Path
- param rfam_seed:
Rfam seeds file.
- type rfam_seed:
Path
- param cmsearch_bin:
cmsearch software path.
- type cmsearch_bin:
Path
- param rnafold_bin:
RNAfold software path.
- type rnafold_bin:
Path
- param num_threads:
Number of threads.
- type num_threads:
int
- return:
None
- rtype:
None