catcheR_load - Data loading¶
This step enables the loading of single-cell data generated by either catcheR_10Xcatch or catcheR_scicatch, following GTF-based annotation. The data is imported into a Monocle object, where experimental design information is added, followed by normalization and clustering.
Preparation¶
Before running catcheR_load, prepare the following in a new working folder:
Copy the count matrix annotated with gene names from the previous step (e.g.,
filtered_annotated_silencing_matrix_complete_all_samples.csv)Copy the file
rc_barcodes_genes.csvCreate a newline-separated plain text file listing the control genes (e.g., SCR, B2M)
Create a newline-separated plain text file listing the control samples (if any) (e.g., 1, 3). These sample names should match those used by
aggr(see the input CSV file used foraggr)Create a newline-separated plain text file listing the sample replicate labels (e.g. batch1, batch1, batch2, batch2). The order must match the sample order in the input matrix exactly. This file is required for downstream batch-aware analyses. If your dataset includes multiple experiments or processing batches, batch correction is recommended.
Create a CSV file listing each sample along with its annotation name (required). This will be used in plots instead of the sample number. Example file available on GitHub
(Optional) Create a newline-separated plain text file listing genes of interest whose expression will be visualized on the UMAP.
Running catcheR_load¶
catcheR_load(
group = "docker",
folder,
expression.matrix,
control_genes,
control_samples = NULL,
replicates = NULL,
sample_names,
resolution = 8e-4,
genes = NULL
)
Example usage:
catcheR_load(
group = "docker",
folder = "/path/to/working/folder/",
expression.matrix = "annotated_silencing_matrix_complete_all_samples.csv",
control_genes = "controls.txt",
control_samples = "noTET.txt",
replicates = "replicates.txt",
sample_names = "samples.csv",
resolution = 8e-4,
genes = "genelist.txt"
)
The resolution argument sets the resolution parameter used by Monocle’s cluster_cells function.
Outputs¶
Running catcheR_load produces the following outputs:
expression_data.csvandcell_metadata.csvThese can be used to create a Monocle Cell Data Set (CDS) and are also bundled instarting_cds.RData, the ready-to-load R object.UMAP.pdfDimensionality reduction UMAP plot.UMAP_gene_expression.pdfGene expression overlay on UMAP using genes from thegenesargument.UMAP_clustering.pdfClustering result visualized on UMAP at the specified resolution.processed_cds.RDataThe Monocle CDS after normalization, dimensionality reduction, clustering, and trajectory inference.
Compatibility¶
At the end of this step, your data will be structured for use with the monocle3 package.
However, it is also possible to switch to other frameworks such as Seurat or Scanpy:
library(SeuratWrappers)
library(Seurat)
seurat <- as.Seurat(cds, assay = NULL)
scanpy_sce <- as.SingleCellExperiment(seurat)
Note
For standard iPS2-seq perturbation analysis, always continue using the CDS object generated by catcheR.
Next steps¶
The following analyses can be performed after this step:
catcheR_pseudotimecatcheR_modulescatcheR_enrichment