Count ratio uncertainty modeling based linear regression (crumblr) returns CLR-transformed counts and observation-level inverse-variance weights for use in weighted linear models.
Usage
crumblr(
counts,
pseudocount = 0.5,
method = c("clr", "clr_2class"),
tau = 1,
max.ratio = 5,
quant = 0.05
)
# S4 method for matrix
crumblr(
counts,
pseudocount = 0.5,
method = c("clr", "clr_2class"),
tau = 1,
max.ratio = 5,
quant = 0.05
)
# S4 method for data.frame
crumblr(
counts,
pseudocount = 0.5,
method = c("clr", "clr_2class"),
tau = 1,
max.ratio = 5,
quant = 0.05
)
Arguments
- counts
count data with samples as rows and variables are columns
- pseudocount
added to counts to avoid issues with zeros
- method
"clr"
computes standard centered log ratio and precision weights based on the delta approximation."clr_2class"
computes theclr()
transform for categoryi
using 2 classes: 1) counts in category i, and 2) counts _not_ in category i.- tau
overdispersion parameter for Dirichlet multinomial. If
NULL
, estimate from observed counts.- max.ratio
regularize estimates of the weights to have a maximum ratio of
max.ratio
between the maximum andquant
quantile value- quant
quantile value used for
max.ratio
Value
An EList
object with the following components:
- E:
numeric matrix of CLR transformed counts
- weights:
numeric matrix of observation-level inverse-variance weights
Details
Evalute the centered log ratio (CLR) transform of the count matrix, and the asymptotic theoretical variances of each transformed observation. The asymptotic normal approximation is increasingly accurate for small overdispersion \(\tau\), large total counts \(C\), and large proportions \(p\), but shows good agreement with the empirical results in most situtations. In practice, it is often reasonable to assume a sufficient number of counts before a variable is included in an analysis anyway. But the feasability of this assumption is up to the user to determine.
Given the array p
storing proportions for one sample across all categories, the delta approximation uses the term 1/p
. This can be unstable for small values of p
, and the estimated variances can be sensitive to small changes in the proprtions. To address this, the "clr_2class"
method computes the clr()
transform for category i
using 2 classes: 1) counts in category i, and 2) counts _not_ in category i. Since class (2) now sums counts across all other categories, the small proportions are avoided and the variance estimates are more stable.
For real data, the asymptotic variance formula can give weights that vary substantially across samples and give very high weights for a subset of samples. In order to address this, we regularize the weights to reduce the variation in the weights to have a maximum ratio of max.ratio
between the maximum and quant
quantile value.
Examples
# set probability of each category
prob <- c(0.1, 0.2, 0.3, 0.5)
# number of total counts
countsTotal <- 300
# number of samples
n_samples <- 100
# simulate info for each sample
info <- data.frame(Age = rgamma(n_samples, 50, 1))
rownames(info) <- paste0("sample_", 1:n_samples)
# simulate counts from multinomial
counts <- t(rmultinom(n_samples, size = countsTotal, prob = prob))
colnames(counts) <- paste0("cat_", 1:length(prob))
rownames(counts) <- paste0("sample_", 1:n_samples)
# run crumblr on counts
cobj <- crumblr(counts)
# run standard variancePartition analysis on crumblr results
library(variancePartition)
#> Loading required package: limma
#> Loading required package: BiocParallel
#>
#> Attaching package: ‘variancePartition’
#> The following object is masked from ‘package:limma’:
#>
#> topTable
fit <- dream(cobj, ~Age, info)
fit <- eBayes(fit)
topTable(fit, coef = "Age", sort.by = "none")
#> logFC AveExpr t P.Value adj.P.Val B
#> cat_1 -0.0005477589 -0.8566924 -0.2637090 0.7925229 0.8449837 -8.860497
#> cat_2 0.0010873618 -0.1446856 0.6497158 0.5172998 0.8449837 -8.625395
#> cat_3 -0.0005053861 0.2350211 -0.3532166 0.7246368 0.8449837 -8.747617
#> cat_4 0.0002294497 0.7663569 0.1960085 0.8449837 0.8449837 -8.770189