Below, you can see snippets of code that highlight the main steps of Monocle 3. Click on the section headers to jump to the detailed sections describing each one and allowing you to try the steps on example data.
cell_data_set
:
cell_data_set
class. The class is derived from the Bioconductor
SingleCellExperiment
class,
which provides a common interface familiar to those who have analyzed other
single-cell experiments with Bioconductor. The class requires three input files:
expression_matrix
, a numeric matrix of expression values,
where rows are genes, and columns are cells
cell_metadata
, a data frame, where rows are cells, and
columns are cell attributes (such as cell type, culture condition, day
captured, etc.)
gene_metadata
, an data frame, where rows are features (e.g.
genes), and columns are gene attributes, such as biotype, gc content,
etc.
cell_metadata
has
rows.
gene_metadata
has
rows.
cell_metadata
object should match the column
names of the expression matrix.
gene_metadata
object should match row names
of the expression matrix.
gene_metadata
should be named
"gene_short_name", which represents the gene symbol or simple name (generally
used for plotting) for each gene.
You can create a new cell_data_set
(CDS) object as follows:
To input data from 10X Genomics Cell Ranger, you can use the
load_cellranger_data
function:
Note: load_cellranger_data
takes an argument umi_cutoff
that determines how many reads a cell must have to be included. By default, this is set to 100.
If you would like to include all cells, set umi_cutoff
to 0.
For load_cellranger_data
to find the correct files, you must provide a path to the folder containing the
un-modified Cell Ranger 'outs' folder. Your file structure should look like: 10x_data/outs/filtered_feature_bc_matrix/
where filtered_feature_bc_matrix contains files features.tsv.gz, barcodes.tsv.gz and matrix.mtx.gz.
(load_cellranger_data
can also handle Cell Ranger V2 data where "features" is substituted for "gene" and
the files are not gzipped.)
Alternatively, you can use load_mm_data
to load any data in MatrixMarket format by providing the matrix files
and two metadata files (features information and cell information). For more details, run ?load_mm_data
Some single-cell RNA-Seq experiments report measurements from tens of thousands of cells or more. As instrumentation improves and costs drop, experiments will become ever larger and more complex, with many conditions, controls, and replicates. A matrix of expression data with 50,000 cells and a measurement for each of the 25,000+ genes in the human genome can take up a lot of memory. However, because current protocols typically don't capture all or even most of the mRNA molecules in each cell, many of the entries of expression matrices are zero. Using sparse matrices can help you work with huge datasets on a typical computer. We generally recommend the use of sparse matrices for most users, as it speeds up many computations even for more modestly sized datasets.
To work with your data in a sparse format, simply provide it to Monocle 3
as a sparse matrix from the Matrix
package:
new_cell_data_set
without first converting it to a
dense matrix (via as.matrix()
, because that may exceed your
available memeory.
Matrix
package.
Other sparse matrix packages, such as slam
or
SparseM
are not supported.
If you have multiple CDS objects that you would like to analyze together, use our
combine_cds
. combine_cds
takes a list of CDS objects and
combines them into a single CDS object.
keep_all_genes
: When TRUE (default), all genes are kept even if they don't match
between the different CDSs. Cells that do not have a given gene in their CDS will
be marked as having zero expression. When FALSE, only the genes in common among all
CDSs will be kept.
cell_names_unique
: When FALSE (default), the cell names in the CDSs are not
assumed to be unique, and so a CDS specifier is appended to each cell name. When TRUE,
no specifier is added.