catcheR_load - Data loading ============ This step enables the loading of single-cell data generated by either ``catcheR_10Xcatch`` or ``catcheR_scicatch``, following GTF-based annotation. The data is imported into a **Monocle** object, where experimental design information is added, followed by normalization and clustering. Preparation ----------- Before running ``catcheR_load``, prepare the following in a new working folder: #. Copy the count matrix annotated with gene names from the previous step (e.g., ``filtered_annotated_silencing_matrix_complete_all_samples.csv``) #. Copy the file ``rc_barcodes_genes.csv`` #. Create a newline-separated plain text file listing the **control genes** (e.g., SCR, B2M) #. Create a newline-separated plain text file listing the **control samples** (if any) (e.g., 1, 3). These sample names should match those used by ``aggr`` (see the input CSV file used for ``aggr``) #. Create a newline-separated plain text file listing the **sample replicate labels** (e.g. batch1, batch1, batch2, batch2). The order must match the sample order in the input matrix exactly. This file is required for downstream **batch-aware analyses**. If your dataset includes multiple experiments or processing batches, batch correction is recommended. #. Create a **CSV file** listing each sample along with its **annotation name** (required). This will be used in plots instead of the sample number. Example file available on `GitHub `_ #. *(Optional)* Create a newline-separated plain text file listing **genes of interest** whose expression will be visualized on the UMAP. Running ``catcheR_load`` ------------------------ .. code-block:: r catcheR_load( group = "docker", folder, expression.matrix, control_genes, control_samples = NULL, replicates = NULL, sample_names, resolution = 8e-4, genes = NULL ) **Example usage:** .. code-block:: r catcheR_load( group = "docker", folder = "/path/to/working/folder/", expression.matrix = "annotated_silencing_matrix_complete_all_samples.csv", control_genes = "controls.txt", control_samples = "noTET.txt", replicates = "replicates.txt", sample_names = "samples.csv", resolution = 8e-4, genes = "genelist.txt" ) The ``resolution`` argument sets the resolution parameter used by Monocle’s ``cluster_cells`` function. Outputs ------- Running ``catcheR_load`` produces the following outputs: #. ``expression_data.csv`` and ``cell_metadata.csv`` These can be used to create a Monocle Cell Data Set (CDS) and are also bundled in ``starting_cds.RData``, the ready-to-load R object. #. ``UMAP.pdf`` Dimensionality reduction UMAP plot. .. image:: UMAP.pdf #. ``UMAP_gene_expression.pdf`` Gene expression overlay on UMAP using genes from the ``genes`` argument. .. image:: UMAP_gene_expression.pdf #. ``UMAP_clustering.pdf`` Clustering result visualized on UMAP at the specified resolution. .. image:: UMAP_clustering.pdf #. ``processed_cds.RData`` The Monocle CDS after normalization, dimensionality reduction, clustering, and trajectory inference. Compatibility ------------- At the end of this step, your data will be structured for use with the ``monocle3`` package. However, it is also possible to switch to other frameworks such as **Seurat** or **Scanpy**: .. code-block:: r library(SeuratWrappers) library(Seurat) seurat <- as.Seurat(cds, assay = NULL) scanpy_sce <- as.SingleCellExperiment(seurat) .. note:: For standard **iPS2-seq** perturbation analysis, always continue using the ``CDS`` object generated by ``catcheR``. Next steps ---------- The following analyses can be performed after this step: - ``catcheR_pseudotime`` - ``catcheR_modules`` - ``catcheR_enrichment``