catcheR_10Xcatch - iPS2-10X-seq Perturbation Deconvolution¶
shRNA perturbations can be assigned to single cells using catcheR_10Xcatch. This tool identifies NGS reads containing UCI-BCs, matches them to transcriptomes via shared cell barcodes, and applies several noise-reduction filters to select cells with strong evidence of a single shRNA integration.
The filtering involves:
Removing UCI-BCs supported by few UMIs (likely artifacts),
Filtering UCI-BCs with a low UMI fraction compared to others in the same cell,
Removing ambiguous cases where multiple UCI-BCs are close to threshold.
Only cells with a single robust shRNA integration are retained.
This pipeline applies to experiments using 10X-based platforms (iPS2-10X-seq, iPS2-CITE-seq, or iPS2-multi-seq), which share similar count matrix structure. The analysis proceeds with:
catcheR_10Xcatch: complete pipeline with automatic thresholdingcatcheR_10XcatchQC: optional refinement of thresholdscatcheR_filtercatch: re-filtering using refined thresholdscatcheR_10Xnocatch: add unperturbed cells as negative controls
Note
Cell Ranger may be required to obtain count matrices. You can install it manually or use Docker containers from:
Step-by-step¶
In a new working folder:
Copy demultiplexed Read 1 and Read 2 files of the UCI-BC library
Copy the gene expression count matrix CSV
Create a CSV file named
rc_barcodes_genes.csvwith:
CAAGAGCC,SMAD2.1 ...
Run
catcheR_10Xcatch:
catcheR_10Xcatch(
group = c("docker", "sudo"),
folder = "path/to/folder",
fastq.read1 = "R1.fastq.gz",
fastq.read2 = "R2.fastq.gz",
expression.matrix = "filename.csv",
reference = "GGCGCGTTCATCTGGGGGAGCCG",
UCI.length = 6,
threads = 2,
percentage = 15,
mode = "bimodal",
ratio = 5,
samples = 1,
x = 100,
y = 400
)
Arguments:
group: “docker” or “sudo” depending on permissions (Docker install guide)folder: working directoryfastq.read1: filename for Read 1 (UCI-BC library)fastq.read2: filename for Read 2expression.matrix: cell-by-gene count matrix (CSV format)reference: (optional) reverse complement of constant region before UCI; default is “GGCGCGTTCATCTGGGGGAGCCG”UCI.length: (optional) UCI length, default is 6threads: (optional) number of threads for parallel processingpercentage: (optional) minimum % of UMIs for a UCI to be valid; default 15mode: (optional) “bimodal” (default) or “noise” for thresholding strategyratio: (optional) minimum UMI ratio between top 2 UCIs; default 5samples: (optional) number of multiplexed samples; default 1x,y: (optional) plot axis limits for UMIxUCI distribution; defaults: 100 (x), 400 (y)
Example:
catcheR_10Xcatch(
group = "docker",
folder = "path/to/folder",
fastq.read1 = "R1.fastq.gz",
fastq.read2 = "R2.fastq.gz",
expression.matrix = "matrix.csv",
threads = 12
)
Output Files (in Result/ folder):
log.txt: number of reads processedlog2.txt: number of cells, UCIs, UMIs, threshold values (bimodal & noise)Barplots of UMI counts per shRNA and per gene
Histogram: UMI counts per UCI (UMIxUCI)
Histogram: UCI UMI percentage in cell (UMIpercentagexUCI)
2D dot plots: - UMI vs UMI% per UCI, colored by valid integration count or status
log_part3.txt: number of single-integration vs filtered cellssilencing_matrix.csv: annotated expression matrix with shRNA assignment (also in RDS format)Annotated cell names follow this format:
TTCTAACCACAGTCGC_180_CGTGATGC_NKX2.5_ACAGTG
Where:
TTCTAACCACAGTCGC= original 10X barcode (cellID)180= number of UMIs supporting the shRNACGTGATGC= shRNA barcode (BC)NKX2.5= target geneACAGTG= UCI
You can use this matrix directly for downstream analysis functions.