catcheR_10Xcatch - iPS2-10X-seq Perturbation Deconvolution¶

shRNA perturbations can be assigned to single cells using catcheR_10Xcatch. This tool identifies NGS reads containing UCI-BCs, matches them to transcriptomes via shared cell barcodes, and applies several noise-reduction filters to select cells with strong evidence of a single shRNA integration.

The filtering involves:

Removing UCI-BCs supported by few UMIs (likely artifacts),
Filtering UCI-BCs with a low UMI fraction compared to others in the same cell,
Removing ambiguous cases where multiple UCI-BCs are close to threshold.

Only cells with a single robust shRNA integration are retained.

This pipeline applies to experiments using 10X-based platforms (iPS2-10X-seq, iPS2-CITE-seq, or iPS2-multi-seq), which share similar count matrix structure. The analysis proceeds with:

catcheR_10Xcatch: complete pipeline with automatic thresholding
catcheR_10XcatchQC: optional refinement of thresholds
catcheR_filtercatch: re-filtering using refined thresholds
catcheR_10Xnocatch: add unperturbed cells as negative controls

Note

Cell Ranger may be required to obtain count matrices. You can install it manually or use Docker containers from:

Step-by-step¶

In a new working folder:
1. Copy demultiplexed Read 1 and Read 2 files of the UCI-BC library
2. Copy the gene expression count matrix CSV
3. Create a CSV file named rc_barcodes_genes.csv with:
```
CAAGAGCC,SMAD2.1
...
```
Run catcheR_10Xcatch:

catcheR_10Xcatch(
    group = c("docker", "sudo"),
    folder = "path/to/folder",
    fastq.read1 = "R1.fastq.gz",
    fastq.read2 = "R2.fastq.gz",
    expression.matrix = "filename.csv",
    reference = "GGCGCGTTCATCTGGGGGAGCCG",
    UCI.length = 6,
    threads = 2,
    percentage = 15,
    mode = "bimodal",
    ratio = 5,
    samples = 1,
    x = 100,
    y = 400
)

Arguments:

group: “docker” or “sudo” depending on permissions (Docker install guide)
folder: working directory
fastq.read1: filename for Read 1 (UCI-BC library)
fastq.read2: filename for Read 2
expression.matrix: cell-by-gene count matrix (CSV format)
reference: (optional) reverse complement of constant region before UCI; default is “GGCGCGTTCATCTGGGGGAGCCG”
UCI.length: (optional) UCI length, default is 6
threads: (optional) number of threads for parallel processing
percentage: (optional) minimum % of UMIs for a UCI to be valid; default 15
mode: (optional) “bimodal” (default) or “noise” for thresholding strategy
ratio: (optional) minimum UMI ratio between top 2 UCIs; default 5
samples: (optional) number of multiplexed samples; default 1
x, y: (optional) plot axis limits for UMIxUCI distribution; defaults: 100 (x), 400 (y)

Example:

catcheR_10Xcatch(
    group = "docker",
    folder = "path/to/folder",
    fastq.read1 = "R1.fastq.gz",
    fastq.read2 = "R2.fastq.gz",
    expression.matrix = "matrix.csv",
    threads = 12
)

Output Files (in Result/ folder):

log.txt: number of reads processed
log2.txt: number of cells, UCIs, UMIs, threshold values (bimodal & noise)
Barplots of UMI counts per shRNA and per gene
Histogram: UMI counts per UCI (UMIxUCI)
Histogram: UCI UMI percentage in cell (UMIpercentagexUCI)
2D dot plots: - UMI vs UMI% per UCI, colored by valid integration count or status
log_part3.txt: number of single-integration vs filtered cells
silencing_matrix.csv: annotated expression matrix with shRNA assignment (also in RDS format)

Annotated cell names follow this format:

TTCTAACCACAGTCGC_180_CGTGATGC_NKX2.5_ACAGTG

Where:

TTCTAACCACAGTCGC = original 10X barcode (cellID)
180 = number of UMIs supporting the shRNA
CGTGATGC = shRNA barcode (BC)
NKX2.5 = target gene
ACAGTG = UCI

You can use this matrix directly for downstream analysis functions.

catcheR_10Xcatch - iPS2-10X-seq Perturbation Deconvolution¶

Step-by-step¶

catcheR

Navigation

Related Topics