create_target_indices ¶
create_target_indices(sample_sheet_file, output_json_file)
Reads the sample sheet CSV and creates a dictionary mapping Sample_ID to a list of four target index strings. Writes the dictionary to a JSON file.
The sample sheet is assumed to have the following columns: Sample_ID, Sample_Name, Sample_Plate, Sample_Well, Index_Plate_Well, I7_Index_ID, index, I5_Index_ID, index2, Sample_Project, Description
For each sample, it creates: - i7_variants: [original I7 index, 'N' + I7[1:]] - i5_variants: [original I5 index, 'N' + I5[1:]]
And then combines them as: "i7_variant + '+' + i5_variant"
filter_reads ¶
filter_reads(input_file, output_file, target_indices)
Filters reads from a FASTQ file (gzip-compressed) based on whether any of the target index strings appear in the read header.
process_sample ¶
process_sample(
sample_id,
target_indices,
input_r1,
input_r2,
output_dir,
)
For a given sample, filter reads from the undetermined R1 and R2 FASTQ files using its target indices. Writes output files into output_dir.
undetermined_demultiplexer ¶
undetermined_demultiplexer(
sample_sheet: str,
input_r1: str,
input_r2: str,
*,
output_dir: str,
json_output: str,
threads: int = 4,
) -> None
Filter undetermined FASTQ files for multiple samples using index information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_sheet
|
str
|
Path to the sample sheet CSV file. |
required |
input_r1
|
str
|
Path to the undetermined R1 FASTQ.gz file. |
required |
input_r2
|
str
|
Path to the undetermined R2 FASTQ.gz file. |
required |
output_dir
|
str
|
Directory to store the filtered FASTQ files. |
required |
json_output
|
str
|
Path to output JSON file for sample target indices. |
required |
threads
|
int
|
Number of threads to use, default is 4. |
4
|
Returns:
| Type | Description |
|---|---|
None
|
|