Skip to content

downsample

downsample(
    sample_id: str,
    r1: str,
    r2: str,
    out_path: str,
    reference: str,
    *,
    threads: int = 40,
    remove_all_dups: bool = False,
    remove_seq_dups: bool = False,
    use_gatk_md: bool = False,
    strategy: str = "HighAccuracy",
    keep: float = 0.5,
) -> None

Pipeline to map, deduplicate, and downsample sequencing reads.

Parameters:

Name Type Description Default
sample_id str

Sample ID.

required
r1 str

Path to R1 fastq file.

required
r2 str

Path to R2 fastq file.

required
out_path str

Output path for the results.

required
reference str

Path to the reference genome.

required
threads int

Number of threads to use , default is 40.

40
remove_all_dups bool

Whether to remove all duplicates, default is False.

False
remove_seq_dups bool

Whether to remove sequencing duplicates, default is False.

False
use_gatk_md bool

Whether to use Use GATK MarkDuplicatesSpark, default is False.

False
strategy str

Downsampling strategy, default is 'High Accuracy'

'HighAccuracy'
keep float

Ratio of the reads to keep [0-1], default is 0.5.

0.5