Getting started ======= **catcheR** is a comprehensive bioinformatic package for designing and analyzing iPS2-seq experiments. Find the tool at https://github.com/alessandro-bertero/catcheR/tree/dev and the preprint at http://dx.doi.org/10.2139/ssrn.4854180 It comprises the following functions: 1. **`catcheR_design`** Designs oligonucleotides for *Supplemental Protocol 1 – Design shRNA oligonucleotides*, facilitating shRNA library cloning. 2. **`catcheR_step1QC`** Analyzes the results of *Supplemental Protocol 1 – Intermediate plasmid pool QC*, assessing pooled cloning step 1 plasmids for barcode swaps. 3. **`catcheR_step2QC`** Analyzes the results of *Supplemental Protocol 1 – Final plasmid pool QC or hiPSC pool QC*, assessing pooled cloning step 2 or genome-edited hiPSC pools for shRNA representation. 4. **`catcheR_scicount`** Analyzes 2-level indexing sci-RNA-seq data, facilitating the generation of gene expression matrix for iPS2-sci-seq experiments. 5. **`catcheR_scicatch`** Assigns shRNA perturbations to single nuclei transcriptomes obtained by *Supplemental Protocol 2*, enabling the primary analysis of iPS2-sci-seq. 6. **`catcheR_10Xcatch`** Assigns shRNA perturbations to single cell transcriptomes obtained by *Supplemental Protocol 3*, enabling the primary analysis of iPS2-10X-seq. 7. **`catcheR_scicatchQC`** and **`catcheR_10XcatchQC`** Use the outputs of `catcheR_scicatch` and `catcheR_10Xcatch`, respectively, to fine-tune shRNA assignment thresholds. 8. **`catcheR_filtercatch`** Leverages the output of `catcheR_scicatchQC` and `catcheR_10XcatchQC` to filter single nuclei/cell transcriptomes expressing a single shRNA. 9. **`catcheR_sortcatch`** Quality-controls the cell-by-gene matrix based on the results of `catcheR_step1QC`, reassigning hPSC clones with barcode swaps to the correct shRNA. 10. **`catcheR_scinocatch`** and **`catcheR_10Xnocatch`** Identify cells expressing no shRNA in iPS2-sci-seq and iPS2-10X-seq experiments, respectively, adding them to the cell-by-gene matrix to be used as additional controls. 11. **`catcheR_load`** Loads gene expression matrices annotated with shRNA perturbations into a *Monocle* object, preparing the dataset for downstream analysis. 12. **`catcheR_pseudotime`** Analyzes the effects of shRNA perturbations on pseudotime dynamics, highlighting shifts along differentiation trajectories. 13. **`catcheR_modules`** Assesses perturbation-induced changes in gene module expression, such as coordinated activation or repression of functional programs. 14. **`catcheR_enrichment`** Quantifies differences in perturbation representation across experimental samples or cell clusters. For the minimal catcheR analysis of a 10X experiment, run ``catcheR_10Xcatch`` to anntate the gene expression matrix with perturbations and then ``catcheR_load`` to start the exploratory analysis. Availability and Installation ----------------------------- **catcheR** is available at: https://github.com/alessandro-bertero/catcheR The GitHub repository folder **scripts** contains all the bash and R scripts that can be run independently. However, for reproducible analyses, it is strongly recommended to install the catcheR package from GitHub, since its functions run inside Docker containers. **Installation steps:** 1. Install the Docker engine Follow the instructions at: https://docs.docker.com/engine/install/ 2. Install ``catcheR`` b. Install **catcheR** from GitHub: .. code-block:: R devtools::install_github("alessandro-bertero/catcheR") c. Install **rrundocker** from GitHub: .. code-block:: R devtools::install_github("Reproducible-Bioinformatics/rrundocker") d. Load **catcheR** and **rrundocker** in your R environment: .. code-block:: R library(catcheR) library(rrundocker) Notes on obtaining iPS2-10X-seq, iPS2-CITE-seq, or iPS2-multi-seq Count Matrices ----------------------------------------------------------------------- If starting from raw data, gene expression matrix need to be generated. 1. Download and install: - `cellranger `_ *(for iPS2-10X-seq and iPS2-CITE-seq)* - `cellranger-arc `_ *(for iPS2-multi-seq)* Alternatively, Docker containers are available: - `cellranger v7 (Docker) `_ - `cellranger v9 (Docker, recommended for iPS2-CITE-seq) `_ - `cellranger-arc (Docker) `_ 2. For **iPS2-10X-seq** and **iPS2-CITE-seq**, demultiplex Illumina BCL files using ``cellranger mkfastq``, following the official 10X Genomics guide. In the sample sheet CSV, include the index sequences used in :ref:`SupplementalProtocolThree` for: - GEX libraries - UCI-BC libraries - (optional) CMO and/or ADT libraries 3. For **iPS2-multi-seq**, use ``cellranger-arc mkfastq`` to demultiplex GEX + ATAC dual-index libraries. Ensure the sample sheet is properly formatted for dual-modality runs and includes index sequences for both GEX and ATAC libraries. 4. Run ``FastQC`` to assess the quality of each FASTQ file per library type. 5. Generate cell-by-gene count matrices: - Use ``cellranger count`` for single-sample experiments - Use ``cellranger multi`` for multiplexed experiments (e.g., iPS2-CITE-seq) - For iPS2-multi-seq, use ``cellranger-arc count`` to obtain both GEX and ATAC matrices In multiplexed experiments (e.g., using CMO or ADT barcodes in iPS2-CITE-seq), individual sample matrices can be aggregated using ``cellranger aggr``. This produces a unified dataset for joint analysis with ``catcheR_10Xcatch``, specifying the number of samples via the ``samples`` argument. 6. Use ``cellranger mat2csv`` to convert sparse matrix outputs into dense CSV files for downstream compatibility. For iPS2-multi-seq, use ``cellranger-arc mat2csv`` separately for the GEX and ATAC outputs if needed. Notes on gene annotation ---------------------------- After running catcheR and before the exploratory analysis, the gene expression matrix should be annotated with gene symbols using the `scannobyGtf `_ function from the R package `rCASC `_. As part of quality control, we recommend evaluating the fraction of ribosomal and mitochondrial reads — for example, using the `mitoRiboUmi `_ function from the same package — and considering the exclusion of cells with abnormally high proportions, which may indicate poor quality or stress. .. note:: After this step, the row names of the matrix (the genes) will have the following format: .. code-block:: text GeneSymbol:EnsemblID **Example:** .. code-block:: text ENSG00000000003:TSPAN6