downsample ¶
downsample(
sample_id: str,
r1: str,
r2: str,
out_path: str,
reference: str,
*,
threads: int = 40,
remove_all_dups: bool = False,
remove_seq_dups: bool = False,
use_gatk_md: bool = False,
strategy: str = "HighAccuracy",
keep: float = 0.5,
) -> None
Pipeline to map, deduplicate, and downsample sequencing reads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id
|
str
|
Sample ID. |
required |
r1
|
str
|
Path to R1 fastq file. |
required |
r2
|
str
|
Path to R2 fastq file. |
required |
out_path
|
str
|
Output path for the results. |
required |
reference
|
str
|
Path to the reference genome. |
required |
threads
|
int
|
Number of threads to use , default is 40. |
40
|
remove_all_dups
|
bool
|
Whether to remove all duplicates, default is False. |
False
|
remove_seq_dups
|
bool
|
Whether to remove sequencing duplicates, default is False. |
False
|
use_gatk_md
|
bool
|
Whether to use Use GATK MarkDuplicatesSpark, default is False. |
False
|
strategy
|
str
|
Downsampling strategy, default is 'High Accuracy' |
'HighAccuracy'
|
keep
|
float
|
Ratio of the reads to keep [0-1], default is 0.5. |
0.5
|