Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs
The dreamlet package enables differential expression analysis on multi-sample single cell datasets using linear (mixed) models with precision weights.
Major functionality of dreamlet
package using the Bioconductor SingleCellExperiment
interface:
-
aggregateToPseudoBulk()
Fast evaluation of pseudobulk from raw or estimated counts -
processAssays()
Normalize aggregated counts, compute precision weights -
fitVarPart()
Variance partitioning analysis -
dreamlet()
Differential expression analysis across samples -
dreamletCompareClusters()
Differential expression analysis across cell clusters -
zenith_gsa()
Gene set analysis with full spectrum of test statistics -
compositePosteriorTest()
Test cell type specifity of effects with Bayesian meta-analysis -
meta_analysis()
Frequentist meta-analysis across cohorts -
outlierByAssay()
Outlier detection based on gene expression
Resources
- Preprint on biorxiv
- Full reproducible analysis code for 4 large-scale datasets
Motivation
Recent advances in single cell/nucleus transcriptomic technology has enabled collection of population-level data sets to study cell type specific gene expression differences associated with disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-friendly, purpose-built analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.
Technical intro
Dreamlet further enables analysis of massive-scale of single cell/nucleus transcriptome datasets by addressing both CPU and memory usage limitations. Dreamlet performs preprocessing and statistical analysis in parallel on multicore machines, and can distribute work across multiple nodes on a compute cluster. Dreamlet also uses the H5AD format for on-disk data storage to enable data processing in smaller chunks to dramatically reduce memory usage.
The dreamlet workflow easily integrates into the Bioconductor ecosystem, and uses the SingleCellExperiment
class to facilitate compatibility with other analyses. Beyond differential expression testing, dreamlet provides seamless integration of downstream analysis including quantifying sources of expression variation, gene set analysis using the full spectrum of gene-level t-statistics, testing differences in cell type composition and visualizing results.
Dreamlet builds on previous work on variance partitioning and differential expression using precision-weighted linear mixed models in the variancePartition package. The dreamlet package is designed to be easily adopted by users of variancePartition and limma.
Install
dreamlet >= 1.0.0
is compatible with BioC v3.18
for R v4.3
.
# 1) Make sure Bioconductor is installed
# 2) Install dreamlet and dependencies:
devtools::install_github("DiseaseNeurogenomics/dreamlet")
# 3) Install zellkonverter >= v1.10.1
BiocManager::install("zellkonverter")
dreamlet
is also compatible with earlier version of R and Bioconductor after installing these dependencies:
Dependencies
In case these aren’t installed automatically:
devtools::install_github("DiseaseNeurogenomics/variancePartition")
devtools::install_github("DiseaseNeurogenomics/zenith")