** Disk-based storage is in the develop branch only at this time. **
You need to install the Monocle3 develop branch to use disk-based count matrix storage. Please see Installing Monocle 3.
Monocle3 can store the count matrix on disk rather than in memory in a way that is essentially transparent to the user. This reduces substantially the memory required to process a data set while using the familiar Monocle3 functions. This feature depends on Ben Parks' excellent BPCells R package.
Make a cell_data_set with a disk-based counts matrix using the matrix_control
parameter with Monocle3 functions that make a cell_data_set; for example, load_mm_data()
. In this example, the counts matrix is in a MatrixMarket file called counts.mtx
, the gene names are in features.txt
, and the cell names are in cells.txt
.
Functions that have the matrix_control
parameter include
load_mm_data()
load_mtx_data()
load_worm_embryo()
load_worm_l2()
load_a549()
combine_cds()
You can convert a dgCMatrix
sparse matrix in the cell_data_set to a disk-based matrix using the function convert_counts_matrix()
. For example
where counts(matrix)
is a dgCMatrix
. convert_counts_matrix()
makes and stores both the BPCells column-order matrix and the row-order matrix, in an effort to keep the two consistent.
The new_cell_data_set()
function has no matrix_control
parameter so it does not convert the input matrix to a disk-based matrix. However, if the input matrix is a BPCells matrix, it makes and stores the row-order copy of the input matrix.
After making the cell_data_set with a disk-based matrix, you process it using the same functions as for a cell_data_set with a dgCMatrix counts matrix. For example,
You must use the save_monocle_objects()
function to store a cell_data_set that has a disk-based counts matrix, and the load_monocle_object()
function to reload the saved cell_data_set.
Monocle3 makes temporary working directories where the disk-based counts matrix files are kept until you quit the R session, at which time Monocle3 tries to remove them. By default, Monocle3 makes these directories in the directory where you are running R. The directories have names like monocle.bpcells.20240412.3106e35c0e4a2.tmp
, which include the date on which the directory is made, a unique string, and the .tmp
suffix. Do not delete these directories while the Monocle3 R session is running. If a temporary directory remains after you quit the R session for some reason, you may delete it, if you are certain that the session completed. If you delete such a directory before the session ends, you will lose the counts matrix, which will make the cell_data_set unusable.
You can tell Monocle3 where you want the disk-based working directories using the matrix_control
parameter with the list element matrix_path
. For example,
preprocess_cds()
create an additional temporary disk-based matrix while it runs.set_matrix_control()
function help has additional information about BPCells features supported by Monocle3.