Getting started¶
catcheR is a comprehensive bioinformatic package for designing and analyzing iPS2-seq experiments.
Find the tool at https://github.com/alessandro-bertero/catcheR/tree/dev and the preprint at http://dx.doi.org/10.2139/ssrn.4854180
It comprises the following functions:
`catcheR_design`
Designs oligonucleotides for Supplemental Protocol 1 – Design shRNA oligonucleotides, facilitating shRNA library cloning.
`catcheR_step1QC`
Analyzes the results of Supplemental Protocol 1 – Intermediate plasmid pool QC, assessing pooled cloning step 1 plasmids for barcode swaps.
`catcheR_step2QC`
Analyzes the results of Supplemental Protocol 1 – Final plasmid pool QC or hiPSC pool QC, assessing pooled cloning step 2 or genome-edited hiPSC pools for shRNA representation.
`catcheR_scicount`
Analyzes 2-level indexing sci-RNA-seq data, facilitating the generation of gene expression matrix for iPS2-sci-seq experiments.
`catcheR_scicatch`
Assigns shRNA perturbations to single nuclei transcriptomes obtained by Supplemental Protocol 2, enabling the primary analysis of iPS2-sci-seq.
`catcheR_10Xcatch`
Assigns shRNA perturbations to single cell transcriptomes obtained by Supplemental Protocol 3, enabling the primary analysis of iPS2-10X-seq.
`catcheR_scicatchQC` and `catcheR_10XcatchQC`
Use the outputs of catcheR_scicatch and catcheR_10Xcatch, respectively, to fine-tune shRNA assignment thresholds.
`catcheR_filtercatch`
Leverages the output of catcheR_scicatchQC and catcheR_10XcatchQC to filter single nuclei/cell transcriptomes expressing a single shRNA.
`catcheR_sortcatch`
Quality-controls the cell-by-gene matrix based on the results of catcheR_step1QC, reassigning hPSC clones with barcode swaps to the correct shRNA.
`catcheR_scinocatch` and `catcheR_10Xnocatch`
Identify cells expressing no shRNA in iPS2-sci-seq and iPS2-10X-seq experiments, respectively, adding them to the cell-by-gene matrix to be used as additional controls.
`catcheR_load`
Loads gene expression matrices annotated with shRNA perturbations into a Monocle object, preparing the dataset for downstream analysis.
`catcheR_pseudotime`
Analyzes the effects of shRNA perturbations on pseudotime dynamics, highlighting shifts along differentiation trajectories.
`catcheR_modules`
Assesses perturbation-induced changes in gene module expression, such as coordinated activation or repression of functional programs.
`catcheR_enrichment`
Quantifies differences in perturbation representation across experimental samples or cell clusters.
For the minimal catcheR analysis of a 10X experiment, run catcheR_10Xcatch to anntate the gene expression matrix with perturbations and then catcheR_load to start the exploratory analysis.
Availability and Installation¶
catcheR is available at: https://github.com/alessandro-bertero/catcheR
The GitHub repository folder scripts contains all the bash and R scripts that can be run independently. However, for reproducible analyses, it is strongly recommended to install the catcheR package from GitHub, since its functions run inside Docker containers.
Installation steps:
Install the Docker engine Follow the instructions at: https://docs.docker.com/engine/install/
Install
catcheRInstall catcheR from GitHub:
devtools::install_github("alessandro-bertero/catcheR")
Install rrundocker from GitHub:
devtools::install_github("Reproducible-Bioinformatics/rrundocker")
Load catcheR and rrundocker in your R environment:
library(catcheR) library(rrundocker)
Notes on obtaining iPS2-10X-seq, iPS2-CITE-seq, or iPS2-multi-seq Count Matrices¶
If starting from raw data, gene expression matrix need to be generated.
Download and install:
cellranger (for iPS2-10X-seq and iPS2-CITE-seq)
cellranger-arc (for iPS2-multi-seq)
Alternatively, Docker containers are available:
For iPS2-10X-seq and iPS2-CITE-seq, demultiplex Illumina BCL files using
cellranger mkfastq, following the official 10X Genomics guide. In the sample sheet CSV, include the index sequences used in SupplementalProtocolThree for:GEX libraries
UCI-BC libraries
(optional) CMO and/or ADT libraries
For iPS2-multi-seq, use
cellranger-arc mkfastqto demultiplex GEX + ATAC dual-index libraries. Ensure the sample sheet is properly formatted for dual-modality runs and includes index sequences for both GEX and ATAC libraries.Run
FastQCto assess the quality of each FASTQ file per library type.Generate cell-by-gene count matrices:
Use
cellranger countfor single-sample experimentsUse
cellranger multifor multiplexed experiments (e.g., iPS2-CITE-seq)For iPS2-multi-seq, use
cellranger-arc countto obtain both GEX and ATAC matricesIn multiplexed experiments (e.g., using CMO or ADT barcodes in iPS2-CITE-seq), individual sample matrices can be aggregated using
cellranger aggr. This produces a unified dataset for joint analysis withcatcheR_10Xcatch, specifying the number of samples via thesamplesargument.
Use
cellranger mat2csvto convert sparse matrix outputs into dense CSV files for downstream compatibility. For iPS2-multi-seq, usecellranger-arc mat2csvseparately for the GEX and ATAC outputs if needed.
Notes on gene annotation¶
After running catcheR and before the exploratory analysis, the gene expression matrix should be annotated with gene symbols using the scannobyGtf function from the R package rCASC.
As part of quality control, we recommend evaluating the fraction of ribosomal and mitochondrial reads — for example, using the mitoRiboUmi function from the same package — and considering the exclusion of cells with abnormally high proportions, which may indicate poor quality or stress.
Note
After this step, the row names of the matrix (the genes) will have the following format:
GeneSymbol:EnsemblID
Example:
ENSG00000000003:TSPAN6