Differential abundance
Hooke can identify differentially abundant cell types for a given contrast. The underlying PLN model mimics the standard (G)LM-like interface of R::stats
. This allows users to fit a multivariate Poisson lognormal model after correcting for effects of offsets and covariates. This framework allows users to perform multivariate statistical regression to describe how perturbations alter the relative abundances of each cell state. For example, you could model the effects of genotype (WT vs MT) while controlling for experimental batch.
Analysis of differentially abundant cell types in silicosis with Hooke
This data was published in Hasegawa, Franks, et al. bioRxiv. The authors performed a longitudinal transcriptomic analysis using scRNA-sequencing of the lungs from pre- and post-silica challenged mice.
The data includes 12 whole lung samples across pre- and post- intratracheal silica. 35 unique cell states were identified using highly and specifically expressed marker genes. For simplicity, we are splitting samples into 2 levels: exposed and not exposed. The goal is to identify which cell types are differentially abundant post-silica exposure.
cds = readRDS("silicosis_cds.rds")
# for simplicity, we are lumping together pre and post i.t. silica
colData(cds)$exposed = ifelse(colData(cds)$Timepoint == 0, "not exposed", "exposed")
colData(cds)$Rep = as.factor(colData(cds)$Rep)
plot_cells(cds, color_cells_by = "fine_annotation", label_groups_by_cluster = F)
Making a cell count set object
The first step of Hooke is to create a cell_count_set
object. This requires three inputs:
cds
- a monocle3cell_data_set
objectsample group
- A column incolData(cds)
that specifies how cells are grouped into samplescell group
- A column incolData(cds)
that specifies how cells are grouped into types or states (e.g. cluster)
ccs = new_cell_count_set(cds,
sample_group = "ID",
cell_group = "fine_annotation")
Fitting a cell count model
The next step is to fit a cell_count_model
given a model formula. This function works with the standard R formula
notations.
A cell count model requires the following inputs:
ccs
- a Hookecell_count_set
objectmain_model_formula_str
- A character string specifying the model of cell abundances across samples, where terms refer to columns incolData(ccs)
. Put main effects here.nuisance_model_formula_str
- A character string specifying the model of cell abundances across samples. Put nuisance effects here.
In this case we are interested in comparing silica exposed mouse lungs to unexposed lungs.
ccm = new_cell_count_model(ccs,
main_model_formula_str = "~ exposed")
Estimating abundances
Next we can predict fold change values for a given contrast of interest using estimate_abundances()
.
ccm
- a Hookecell_count_model
objectnewdata
- A tibble of variables used for the prediction.min_log_abund
- numeric Minimum log abundance value.
cond_exp = estimate_abundances(ccm, tibble::tibble(exposed = "exposed"))
cond_not_exp = estimate_abundances(ccm, tibble::tibble(exposed = "not exposed"))
cond_not_exp %>% head()
exposed | cell_group | log_abund | log_abund_se | log_abund_sd |
---|---|---|---|---|
not exposed | Endothelial Arterial | 2.87 | 0.196 | 0.559 |
not exposed | Endothelial Capillary | 5.31 | 0.0591 | 0.559 |
not exposed | Endothelial Venous | 3.09 | 0.179 | 0.559 |
not exposed | Lymphatic Endothelial | 2.63 | 0.213 | 0.559 |
not exposed | Aerocytes | 4.06 | 0.110 | 0.559 |
not exposed | Fibroblasts | 5.26 | 0.0607 | 0.559 |
Comparing abundances
And then compare two estimates of cell abundances from a Hooke model to identify significant changes with compare_abundances()
. In this case, exposed vs not exposed.
ccm
- a Hookecell_count_model
objectcond_x
- A cell type abundance estimate from estimate_abundances().cond_y
- A cell type abundance estimate from estimate_abundances().
cond_ne_v_e_tbl = compare_abundances(ccm, cond_not_exp, cond_exp)
cond_ne_v_e_tbl %>% select(cell_group, exposed_x, exposed_y,
delta_log_abund, delta_log_abund_se, delta_q_value)
cell_group | delta_log_abund | delta_log_abund_se | delta_q_value |
---|---|---|---|
Endothelial Arterial | -0.584 | 0.196 | 0.0774 |
Endothelial Capillary | -0.400 | 0.0769 | 0.0131 |
Endothelial Venous | -0.353 | 0.145 | 0.130 |
Lymphatic Endothelial | 0.0529 | 0.330 | 0.989 |
Aerocytes | -0.354 | 0.130 | 0.102 |
Fibroblasts | -0.477 | 0.102 | 0.0131 |
Plotting differential abundance changes on a UMAP
Finally, we can color our UMAP by the fold changes we estimate in compare_abundances()
using the plot_contrast()
function. We can select fold changes that are signifiant given a q_value_thresh
.
ccm
- A Hookecell_count_model
object.cond_b_vs_a_tbl
- A data frame fromcompare_abundances()
.log_abundance_thresh
- numeric Select cell groups by log abundance.q_value_thresh
- Remove contrasts whose change in q-value exceedsq_value_thresh
.
plot_contrast(ccm, cond_ne_v_e_tbl, q_value_thresh = 0.05)
Controlling for batch terms
If your data contains multiple experimental batch, Hooke supports batch correction with the use of a nuisance model string term.
ccm_rep = new_cell_count_model(ccs,
main_model_formula_str = "~ exposed",
nuisance_model_formula_str = "~ Rep")
cond_exp_rep = estimate_abundances(ccm_rep, tibble::tibble(exposed = "exposed", Rep = "1"))
cond_not_exp_rep = estimate_abundances(ccm_rep, tibble::tibble(exposed = "not exposed", Rep = "1"))
cond_ne_v_e_tbl_rep = compare_abundances(ccm_rep, cond_not_exp_rep, cond_exp_rep)
plot_contrast(ccm_rep, cond_ne_v_e_tbl_rep, q_value_thresh = 0.05)
If we control for replicate, we see some difference in significant fold changes.