catcheR_load - Data loading

This step enables the loading of single-cell data generated by either catcheR_10Xcatch or catcheR_scicatch, following GTF-based annotation. The data is imported into a Monocle object, where experimental design information is added, followed by normalization and clustering.

Preparation

Before running catcheR_load, prepare the following in a new working folder:

  1. Copy the count matrix annotated with gene names from the previous step (e.g., filtered_annotated_silencing_matrix_complete_all_samples.csv)

  2. Copy the file rc_barcodes_genes.csv

  3. Create a newline-separated plain text file listing the control genes (e.g., SCR, B2M)

  4. Create a newline-separated plain text file listing the control samples (if any) (e.g., 1, 3). These sample names should match those used by aggr (see the input CSV file used for aggr)

  5. Create a newline-separated plain text file listing the sample replicate labels (e.g. batch1, batch1, batch2, batch2). The order must match the sample order in the input matrix exactly. This file is required for downstream batch-aware analyses. If your dataset includes multiple experiments or processing batches, batch correction is recommended.

  6. Create a CSV file listing each sample along with its annotation name (required). This will be used in plots instead of the sample number. Example file available on GitHub

  7. (Optional) Create a newline-separated plain text file listing genes of interest whose expression will be visualized on the UMAP.

Running catcheR_load

catcheR_load(
  group = "docker",
  folder,
  expression.matrix,
  control_genes,
  control_samples = NULL,
  replicates = NULL,
  sample_names,
  resolution = 8e-4,
  genes = NULL
)

Example usage:

catcheR_load(
  group = "docker",
  folder = "/path/to/working/folder/",
  expression.matrix = "annotated_silencing_matrix_complete_all_samples.csv",
  control_genes = "controls.txt",
  control_samples = "noTET.txt",
  replicates = "replicates.txt",
  sample_names = "samples.csv",
  resolution = 8e-4,
  genes = "genelist.txt"
)

The resolution argument sets the resolution parameter used by Monocle’s cluster_cells function.

Outputs

Running catcheR_load produces the following outputs:

  1. expression_data.csv and cell_metadata.csv These can be used to create a Monocle Cell Data Set (CDS) and are also bundled in starting_cds.RData, the ready-to-load R object.

  2. UMAP.pdf Dimensionality reduction UMAP plot.

    _images/UMAP.pdf
  3. UMAP_gene_expression.pdf Gene expression overlay on UMAP using genes from the genes argument.

    _images/UMAP_gene_expression.pdf
  4. UMAP_clustering.pdf Clustering result visualized on UMAP at the specified resolution.

    _images/UMAP_clustering.pdf
  5. processed_cds.RData The Monocle CDS after normalization, dimensionality reduction, clustering, and trajectory inference.

Compatibility

At the end of this step, your data will be structured for use with the monocle3 package.

However, it is also possible to switch to other frameworks such as Seurat or Scanpy:

library(SeuratWrappers)
library(Seurat)
seurat <- as.Seurat(cds, assay = NULL)
scanpy_sce <- as.SingleCellExperiment(seurat)

Note

For standard iPS2-seq perturbation analysis, always continue using the CDS object generated by catcheR.

Next steps

The following analyses can be performed after this step:

  • catcheR_pseudotime

  • catcheR_modules

  • catcheR_enrichment