STAR Module Documentation¶
The STAR (Spliced Transcripts Alignment to a Reference) alignment tool is widely used in genomics research for aligning RNA-seq data to a reference genome.
Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21. doi:10.1093/bioinformatics/bts635
- ensembl.tools.anno.transcriptomic_annotation.star.run_star(genome_file: Path, output_dir: Path, short_read_fastq_dir: Path, delete_pre_trim_fastq: bool = False, trim_fastq: bool = False, max_reads_per_sample: int = 0, max_intron_length: int = 100000, subsample_read_limit: int = 100000000, subsample_percentage: float = 0.25, sampling_via_read_limit: bool = False, sampling_via_percentage: bool = False, sampling_via_read_limit_percentage: bool = False, num_threads: int = 1, star_bin: Path = PosixPath('star'), samtools_bin: Path = PosixPath('samtools'), trim_galore_bin: Path = PosixPath('trim_galore')) None [source]¶
Run STAR alignment on list of short read data.
- param genome_file:
Genome file path.
- type genome_file:
Path
- param output_dir:
Working directory path.
- type output_dir:
Path
- param short_read_fastq_dir:
Short read directory path.
- type short_read_fastq_dir:
Path
- param delete_pre_trim_fastq:
Delete the original fastq files after trimming. Defaults to False.
- type delete_pre_trim_fastq:
boolean, default False
- param trim_fastq:
Trim short read files using TrimGalore. Defaults to False.
- type trim_fastq:
boolean, default False
- param max_reads_per_sample:
Max number of reads per sample. Defaults to 0 (unlimited).
- type max_reads_per_sample:
int, default 0
- param max_intron_length:
The maximum intron size for alignments. Defaults to 100000.
- type max_intron_length:
int, default 100000
- param subsample_read_limit:
Maximum number of reads to subsample. Defaults to 100000000.
:type subsample_read_limit:int, default 100000000, :param subsample_percentage: Maximun percentage of reads to subsample. :type subsample_percentage: int, default 0.25, :param sampling_via_read_limit: subsample fastq files using subsample_read_limit. :type sampling_via_read_limit: boolean, False, :param sampling_via_percentage: subsample fastq files using subsample_percentage. :type sampling_via_percentage: boolean, False, :param sampling_via_read_limit_percentage: use max read limit and percentage value. :type sampling_via_read_limit_percentage: boolean, False, :param num_threads: Number of available threads. :type num_threads: int, default 1 :param star_bin: Software path. :type star_bin: Path, default star :param samtools_bin: Software path. :type samtools_bin: Path,default samtools :param trim_galore_bin: Software path. :type trim_galore_bin: Path, default trim_galore
- return:
None
- rtype:
None
- ensembl.tools.anno.transcriptomic_annotation.star.run_trimming(output_dir: Path, short_read_fastq_dir: Path, delete_pre_trim_fastq: bool = False, num_threads: int = 1, trim_galore_bin='trim_galore') None [source]¶
Trim list of short read fastq files. :param output_dir: Working directory path. :param short_read_fastq_dir: Short read directory path. :param delete_pre_trim_fastq: Removing original fastq file post trimming. Defaults to False. :param num_threads: Number of threads. :param trim_galore_bin: Software path.
- ensembl.tools.anno.transcriptomic_annotation.star.subsample_transcriptomic_data(fastq_file_list: List[Path], subsample_read_limit: int = 100000000, subsample_percentage: float = 0.25, sampling_via_read_limit: bool = False, sampling_via_percentage: bool = False, sampling_via_read_limit_percentage: bool = True, num_threads: int = 2) List[Path] [source]¶
Subsample list of paired files. :param fastq_file_list: Subsample paired fastq files. :param subsample_read_limit: Maximum number of reads to subsample, default to 100000000. :param subsample_percentage: Maximun percentage of reads to subsample, default to 0.25. :param sampling_via_read_limit: If True will subsample an input dataset of fastq files using –subsample_read_limit value. :param sampling_via_percentage: If True will subsample an input dataset of fastq files using –subsample_percentage value. :param sampling_via_read_limit_percentage: If True will subsample an input dataset of fastq files using –subsample_read_limit and –subsample_percentage value; the lowest number of reads is taken. :param num_threads: number of threads
- Returns:
List of subsampled paired transcriptomic files
- Return type:
List[Path]