STAR Module Documentation

The STAR (Spliced Transcripts Alignment to a Reference) alignment tool is widely used in genomics research for aligning RNA-seq data to a reference genome.

Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21. doi:10.1093/bioinformatics/bts635

ensembl.tools.anno.transcriptomic_annotation.star.run_star(genome_file: Path, output_dir: Path, short_read_fastq_dir: Path, delete_pre_trim_fastq: bool = False, trim_fastq: bool = False, max_reads_per_sample: int = 0, max_intron_length: int = 100000, subsample_read_limit: int = 100000000, subsample_percentage: float = 0.25, sampling_via_read_limit: bool = False, sampling_via_percentage: bool = False, sampling_via_read_limit_percentage: bool = False, num_threads: int = 1, star_bin: Path = PosixPath('star'), samtools_bin: Path = PosixPath('samtools'), trim_galore_bin: Path = PosixPath('trim_galore')) None[source]

Run STAR alignment on list of short read data.

param genome_file:

Genome file path.

type genome_file:

Path

param output_dir:

Working directory path.

type output_dir:

Path

param short_read_fastq_dir:

Short read directory path.

type short_read_fastq_dir:

Path

param delete_pre_trim_fastq:

Delete the original fastq files after trimming. Defaults to False.

type delete_pre_trim_fastq:

boolean, default False

param trim_fastq:

Trim short read files using TrimGalore. Defaults to False.

type trim_fastq:

boolean, default False

param max_reads_per_sample:

Max number of reads per sample. Defaults to 0 (unlimited).

type max_reads_per_sample:

int, default 0

param max_intron_length:

The maximum intron size for alignments. Defaults to 100000.

type max_intron_length:

int, default 100000

param subsample_read_limit:

Maximum number of reads to subsample. Defaults to 100000000.

:type subsample_read_limit:int, default 100000000, :param subsample_percentage: Maximun percentage of reads to subsample. :type subsample_percentage: int, default 0.25, :param sampling_via_read_limit: subsample fastq files using subsample_read_limit. :type sampling_via_read_limit: boolean, False, :param sampling_via_percentage: subsample fastq files using subsample_percentage. :type sampling_via_percentage: boolean, False, :param sampling_via_read_limit_percentage: use max read limit and percentage value. :type sampling_via_read_limit_percentage: boolean, False, :param num_threads: Number of available threads. :type num_threads: int, default 1 :param star_bin: Software path. :type star_bin: Path, default star :param samtools_bin: Software path. :type samtools_bin: Path,default samtools :param trim_galore_bin: Software path. :type trim_galore_bin: Path, default trim_galore

return:

None

rtype:

None

ensembl.tools.anno.transcriptomic_annotation.star.run_trimming(output_dir: Path, short_read_fastq_dir: Path, delete_pre_trim_fastq: bool = False, num_threads: int = 1, trim_galore_bin='trim_galore') None[source]

Trim list of short read fastq files. :param output_dir: Working directory path. :param short_read_fastq_dir: Short read directory path. :param delete_pre_trim_fastq: Removing original fastq file post trimming. Defaults to False. :param num_threads: Number of threads. :param trim_galore_bin: Software path.

ensembl.tools.anno.transcriptomic_annotation.star.subsample_transcriptomic_data(fastq_file_list: List[Path], subsample_read_limit: int = 100000000, subsample_percentage: float = 0.25, sampling_via_read_limit: bool = False, sampling_via_percentage: bool = False, sampling_via_read_limit_percentage: bool = True, num_threads: int = 2) List[Path][source]

Subsample list of paired files. :param fastq_file_list: Subsample paired fastq files. :param subsample_read_limit: Maximum number of reads to subsample, default to 100000000. :param subsample_percentage: Maximun percentage of reads to subsample, default to 0.25. :param sampling_via_read_limit: If True will subsample an input dataset of fastq files using –subsample_read_limit value. :param sampling_via_percentage: If True will subsample an input dataset of fastq files using –subsample_percentage value. :param sampling_via_read_limit_percentage: If True will subsample an input dataset of fastq files using –subsample_read_limit and –subsample_percentage value; the lowest number of reads is taken. :param num_threads: number of threads

Returns:

List of subsampled paired transcriptomic files

Return type:

List[Path]