catcheR_scicounts - Obtain iPS2-sci-seq Count Matrices¶
catcheR_scicountis a wrapper for the bbi-sci pipeline developed by the Brotman Baty Institute for Precision Medicine. This pipeline was dockerized and integrated intocatcheRto enable use across operating systems. It performs demultiplexing and alignment of sci-RNA-seq data, generating a gene expression matrix from FASTQ files.Demultiplex Illumina base calls to FASTQ files:
Create a
SampleSheet.csvfile with one row per PCR well (see SPTwoFive of SupplementalProtocolTwo).Use well IDs as
Sample_ID(format:[A-H][01-12])Use appropriate
indexandindex2fields for i7 and i5 barcodes
Run Illumina
bcl2fastqaccording to Illumina documentation.Run
FastQCto confirm FASTQ quality.
In a new working folder:
Create a subfolder called
fastq/and copy all demultiplexed.fastq.gzfiles into it. File names must start with the well coordinate (e.g.A01_).Create a tab-separated file called
sci-RNA-seq-8.RT.oligosthat maps RT well IDs to barcode sequences:
A01 TTCTCGCATG ...
Create a subfolder called
GENOMES/and copy the annotated genome files into it (e.g., GRCh38). Ensure sufficient disk space (~60 GB).
Run
catcheR_scicount:catcheR_scicount( group = c("docker", "sudo"), folder, sample.name, UMI.cutoff )
`catcheR_scicount` arguments:
group: string, either “docker” or “sudo” depending on user permissions (see: https://docs.docker.com/engine/install/linux-postinstall/ *)folder: string with the full path to the working foldersample.name: string, name of the experimentUMI.cutoff: integer, minimum number of UMIs per nucleus to consider the transcriptome valid
Example usage:
catcheR_scicount( group = "docker", folder = "path/to/file", sample.name = "experiment", UMI.cutoff = 500 )
`catcheR_scicount` outputs (saved in the
final-output/folder):UMI per cell knee plot
Summary statistics
Sparse cell-by-gene matrix
Dense expression matrices:
exp_mat.csvexp_mat_no0.csv(genes with zero counts removed)
Corresponding
.Rdatafiles