
Multi Omics Clustering using Seurat Multi Modal Graph-based Clustering
Source:R/clustering.R
cluster_seurat.Rd
Performs graph-based clustering of cells using
Seurat
, based on one or two log R ratio matrices
(mat_list
), including shared nearest neighbors (SNN) graph construction on
selected dimensions from PCA (dims_list
), to identify clusters of cells for
each specified resolution (res_range
).
For two omics: multimodal integration is performed using
Seurat::FindMultiModalNeighbors()
(weighted shared nearest neighbors graph). Only common cells between omics are used.For a single omic:
Seurat::FindNeighbors()
(shared nearest neighbors graph) is used.
Arguments
- mat_list
A named list of log R ratio matrices (cells x features), one per omic layer (
list
).- res_range
A numeric non-negative vector specifying the resolution values to use for
Seurat::FindClusters()
(numeric
vector). Default isc(0.1, 0.2, 0.3, 0.4, 0.5)
.- dims_list
A list of vectors of PC dimensions to use for each omic (
list
). Must match the length ofmat_list
(e.g., list(1:8) for 1 omic ; list(1:8, 1:8) for 2 omics). Default is the first 8 dimensions for each provided omic.- algorithm
Integer specifying the algorithm for modularity optimization by
Seurat::FindClusters()
(1
= original Louvain algorithm;2
= Louvain algorithm with multilevel refinement;3
= SLM algorithm;4
= Leiden algorithm). Default is1
. RECOMMENDED:4
for Leiden algorithm seecluster_seurat()
Details section.- knn_seurat
Integer specifying the number of nearest neighbors used for graph construction with
Seurat::Seurat()
functionsSeurat::FindNeighbors()
(k.param
) orSeurat::FindMultiModalNeighbors()
(k.nn
) (integer
). Default is20
.- knn_range_seurat
Integer specifying the approximate number of nearest neighbors to compute for
Seurat::FindMultiModalNeighbors()
(knn.range
) (integer
). Default is200
.- max_dim
Integer specifying the maximum number of principal components to be used for PCA computation with
stats::prcomp()
(integer
). Default is200
.- quiet
Logical. If
TRUE
, suppresses informative messages during execution. Default isFALSE
.
Value
A list containing:
- params
List of parameters used for clustering (
list
).- pcs
List of principal components summaries for each omic (
list
ofsummary.prcomp
).- nn
Nearest neighbors object (
Neighbor
).- graph
Shared nearest neighbors graph (
Graph
).- dist
Distance matrix derived from the graph (
matrix
).- umap
UMAP coordinates (
matrix
).- clusters
A named list of clustering results (vectors of cluster labels) for each value in
res_range
(list
).
Details
The Leiden algorithm (algorithm = 4
) is recommended based on published work and best-practice
guidelines:
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z>
Heumos, L., Schaar, A.C., Lance, C. et al. Best practices for single-cell analysis across modalities. Nat Rev Genet (2023). https://doi.org/10.1038/s41576-023-00586-w https://www.sc-best-practices.org/cellular_structure/clustering.html
Examples
if (FALSE) { # \dontrun{
# Load example muscadet object
# data("muscadet_obj")
# Format input
# transpose matrices to: cells x features matrices
mat_list <- lapply(muscadet::matLogRatio(muscadet_obj), t)
# Run integration & clustering
result <- cluster_seurat(mat_list, res_range = c(0.1, 0.3, 0.5))
# View results
lapply(result$clusters, table)
} # }