Count ratio uncertainty modeling based linear regression

Count ratio uncertainty modeling based linear regression (crumblr) returns CLR-transformed counts and observation-level inverse-variance weights for use in weighted linear models.

Usage

crumblr(
  counts,
  pseudocount = 0.5,
  method = c("clr", "clr_2class"),
  tau = 1,
  max.ratio = 5,
  quant = 0.05
)

# S4 method for class 'matrix'
crumblr(
  counts,
  pseudocount = 0.5,
  method = c("clr", "clr_2class"),
  tau = 1,
  max.ratio = 5,
  quant = 0.05
)

# S4 method for class 'data.frame'
crumblr(
  counts,
  pseudocount = 0.5,
  method = c("clr", "clr_2class"),
  tau = 1,
  max.ratio = 5,
  quant = 0.05
)

Arguments

counts: count data with samples as rows and variables are columns
pseudocount: added to counts to avoid issues with zeros
method: "clr" computes standard centered log ratio and precision weights based on the delta approximation. "clr_2class" computes the clr() transform for category i using 2 classes: 1) counts in category i, and 2) counts _not_ in category i.
tau: overdispersion parameter for Dirichlet multinomial. If NULL, estimate from observed counts.
max.ratio: regularize estimates of the weights to have a maximum ratio of max.ratio between the maximum and quant quantile value
quant: quantile value used for max.ratio

Value

An EList object with the following components:

E:: numeric matrix of CLR transformed counts
weights:: numeric matrix of observation-level inverse-variance weights

Details

Evaluate the centered log ratio (CLR) transform of the count matrix, and the asymptotic theoretical variances of each transformed observation. The asymptotic normal approximation is increasingly accurate for small overdispersion \(\tau\), large total counts \(C\), and large proportions \(p\), but shows good agreement with the empirical results in most situations. In practice, it is often reasonable to assume a sufficient number of counts before a variable is included in an analysis anyway. But the feasibility of this assumption is up to the user to determine.

Given the array p storing proportions for one sample across all categories, the delta approximation uses the term 1/p. This can be unstable for small values of p, and the estimated variances can be sensitive to small changes in the proportions. To address this, the "clr_2class" method computes the clr() transform for category i using 2 classes: 1) counts in category i, and 2) counts _not_ in category i. Since class (2) now sums counts across all other categories, the small proportions are avoided and the variance estimates are more stable.

For real data, the asymptotic variance formula can give weights that vary substantially across samples and give very high weights for a subset of samples. In order to address this, we regularize the weights to reduce the variation in the weights to have a maximum ratio of max.ratio between the maximum and quant quantile value.

Examples

# set probability of each category
prob <- c(0.1, 0.2, 0.3, 0.5)

# number of total counts
countsTotal <- 300

# number of samples
n_samples <- 100

# simulate info for each sample
info <- data.frame(Age = rgamma(n_samples, 50, 1))
rownames(info) <- paste0("sample_", 1:n_samples)

# simulate counts from multinomial
counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob))
colnames(counts) <- paste0("cat_", 1:length(prob))
rownames(counts) <- paste0("sample_", 1:n_samples)

# run crumblr on counts
cobj <- crumblr(counts)

# run standard variancePartition analysis on crumblr results
library(variancePartition)
#> Loading required package: limma
#> Loading required package: BiocParallel
#> 
#> Attaching package: ‘variancePartition’
#> The following object is masked from ‘package:limma’:
#> 
#>     topTable

fit <- dream(cobj, ~Age, info)
fit <- eBayes(fit)

topTable(fit, coef = "Age", sort.by = "none")
#>               logFC    AveExpr          t   P.Value adj.P.Val         B
#> cat_1  0.0030984935 -0.8318606  1.4171050 0.1593695 0.6374780 -7.824921
#> cat_2 -0.0008101691 -0.1620569 -0.4812885 0.6312984 0.6970844 -8.605226
#> cat_3 -0.0009568513  0.2445150 -0.6331966 0.5279620 0.6970844 -8.488427
#> cat_4 -0.0005653714  0.7494026 -0.3903129 0.6970844 0.6970844 -8.615702