catcheR_10Xcatch - iPS2-10X-seq Perturbation Deconvolution

shRNA perturbations can be assigned to single cells using catcheR_10Xcatch. This tool identifies NGS reads containing UCI-BCs, matches them to transcriptomes via shared cell barcodes, and applies several noise-reduction filters to select cells with strong evidence of a single shRNA integration.

The filtering involves:

  1. Removing UCI-BCs supported by few UMIs (likely artifacts),

  2. Filtering UCI-BCs with a low UMI fraction compared to others in the same cell,

  3. Removing ambiguous cases where multiple UCI-BCs are close to threshold.

Only cells with a single robust shRNA integration are retained.

This pipeline applies to experiments using 10X-based platforms (iPS2-10X-seq, iPS2-CITE-seq, or iPS2-multi-seq), which share similar count matrix structure. The analysis proceeds with:

  • catcheR_10Xcatch: complete pipeline with automatic thresholding

  • catcheR_10XcatchQC: optional refinement of thresholds

  • catcheR_filtercatch: re-filtering using refined thresholds

  • catcheR_10Xnocatch: add unperturbed cells as negative controls

Note

Cell Ranger may be required to obtain count matrices. You can install it manually or use Docker containers from:

Step-by-step

  1. In a new working folder:

    1. Copy demultiplexed Read 1 and Read 2 files of the UCI-BC library

    2. Copy the gene expression count matrix CSV

    3. Create a CSV file named rc_barcodes_genes.csv with:

    CAAGAGCC,SMAD2.1
    ...
    
  2. Run catcheR_10Xcatch:

catcheR_10Xcatch(
    group = c("docker", "sudo"),
    folder = "path/to/folder",
    fastq.read1 = "R1.fastq.gz",
    fastq.read2 = "R2.fastq.gz",
    expression.matrix = "filename.csv",
    reference = "GGCGCGTTCATCTGGGGGAGCCG",
    UCI.length = 6,
    threads = 2,
    percentage = 15,
    mode = "bimodal",
    ratio = 5,
    samples = 1,
    x = 100,
    y = 400
)

Arguments:

  • group: “docker” or “sudo” depending on permissions (Docker install guide)

  • folder: working directory

  • fastq.read1: filename for Read 1 (UCI-BC library)

  • fastq.read2: filename for Read 2

  • expression.matrix: cell-by-gene count matrix (CSV format)

  • reference: (optional) reverse complement of constant region before UCI; default is “GGCGCGTTCATCTGGGGGAGCCG”

  • UCI.length: (optional) UCI length, default is 6

  • threads: (optional) number of threads for parallel processing

  • percentage: (optional) minimum % of UMIs for a UCI to be valid; default 15

  • mode: (optional) “bimodal” (default) or “noise” for thresholding strategy

  • ratio: (optional) minimum UMI ratio between top 2 UCIs; default 5

  • samples: (optional) number of multiplexed samples; default 1

  • x, y: (optional) plot axis limits for UMIxUCI distribution; defaults: 100 (x), 400 (y)

Example:

catcheR_10Xcatch(
    group = "docker",
    folder = "path/to/folder",
    fastq.read1 = "R1.fastq.gz",
    fastq.read2 = "R2.fastq.gz",
    expression.matrix = "matrix.csv",
    threads = 12
)

Output Files (in Result/ folder):

  • log.txt: number of reads processed

  • log2.txt: number of cells, UCIs, UMIs, threshold values (bimodal & noise)

  • Barplots of UMI counts per shRNA and per gene

    _images/barcode_distribution.pdf _images/gene_distribution.pdf
  • Histogram: UMI counts per UCI (UMIxUCI)

    _images/UMIxUCI.pdf _images/UMIxUCI_400_400.pdf
  • Histogram: UCI UMI percentage in cell (UMIpercentagexUCI)

    _images/percentage_of_UMIxUCI_dist.pdf
  • 2D dot plots: - UMI vs UMI% per UCI, colored by valid integration count or status

    _images/2D_percentage_of_UMIxUCI_UMI_count_trueorfalse.pdf _images/2D_percentage_of_UMIxUCI_UMI_count_ValidCells.pdf
  • log_part3.txt: number of single-integration vs filtered cells

  • silencing_matrix.csv: annotated expression matrix with shRNA assignment (also in RDS format)

    Annotated cell names follow this format:

TTCTAACCACAGTCGC_180_CGTGATGC_NKX2.5_ACAGTG

Where:

  • TTCTAACCACAGTCGC = original 10X barcode (cellID)

  • 180 = number of UMIs supporting the shRNA

  • CGTGATGC = shRNA barcode (BC)

  • NKX2.5 = target gene

  • ACAGTG = UCI

You can use this matrix directly for downstream analysis functions.