catcheR_scicounts - Obtain iPS2-sci-seq Count Matrices

  1. catcheR_scicount is a wrapper for the bbi-sci pipeline developed by the Brotman Baty Institute for Precision Medicine. This pipeline was dockerized and integrated into catcheR to enable use across operating systems. It performs demultiplexing and alignment of sci-RNA-seq data, generating a gene expression matrix from FASTQ files.

  2. Demultiplex Illumina base calls to FASTQ files:

    1. Create a SampleSheet.csv file with one row per PCR well (see SPTwoFive of SupplementalProtocolTwo).

      • Use well IDs as Sample_ID (format: [A-H][01-12])

      • Use appropriate index and index2 fields for i7 and i5 barcodes

    2. Run Illumina bcl2fastq according to Illumina documentation.

    3. Run FastQC to confirm FASTQ quality.

  3. In a new working folder:

    1. Create a subfolder called fastq/ and copy all demultiplexed .fastq.gz files into it. File names must start with the well coordinate (e.g. A01_).

    2. Create a tab-separated file called sci-RNA-seq-8.RT.oligos that maps RT well IDs to barcode sequences:

    A01       TTCTCGCATG
    ...
    
    1. Create a subfolder called GENOMES/ and copy the annotated genome files into it (e.g., GRCh38). Ensure sufficient disk space (~60 GB).

  4. Run catcheR_scicount:

    catcheR_scicount(
        group = c("docker", "sudo"),
        folder,
        sample.name,
        UMI.cutoff
    )
    

    `catcheR_scicount` arguments:

    1. group: string, either “docker” or “sudo” depending on user permissions (see: https://docs.docker.com/engine/install/linux-postinstall/ *)

    2. folder: string with the full path to the working folder

    3. sample.name: string, name of the experiment

    4. UMI.cutoff: integer, minimum number of UMIs per nucleus to consider the transcriptome valid

    Example usage:

    catcheR_scicount(
        group = "docker",
        folder = "path/to/file",
        sample.name = "experiment",
        UMI.cutoff = 500
    )
    
  5. `catcheR_scicount` outputs (saved in the final-output/ folder):

    • UMI per cell knee plot

    • Summary statistics

    • Sparse cell-by-gene matrix

    • Dense expression matrices:

      • exp_mat.csv

      • exp_mat_no0.csv (genes with zero counts removed)

    • Corresponding .Rdata files