
Multi Omics Clustering using Seurat Multi Modal Graph-based Clustering
Source:R/clustering.R
cluster_seurat.RdPerforms graph-based clustering of cells using
Seurat, based on one or two log R ratio matrices
(mat_list), including shared nearest neighbors (SNN) graph construction on
selected dimensions from PCA (dims_list), to identify clusters of cells for
each specified resolution (res_range).
For two omics: multimodal integration is performed using
Seurat::FindMultiModalNeighbors()(weighted shared nearest neighbors graph). Only common cells between omics are used.For a single omic:
Seurat::FindNeighbors()(shared nearest neighbors graph) is used.
Arguments
- mat_list
A named list of log R ratio matrices (cells x features), one per omic layer (
list).- res_range
A numeric non-negative vector specifying the resolution values to use for
Seurat::FindClusters()(numericvector). Default isc(0.1, 0.2, 0.3, 0.4, 0.5).- dims_list
A list of vectors of PC dimensions to use for each omic (
list). Must match the length ofmat_list(e.g., list(1:8) for 1 omic ; list(1:8, 1:8) for 2 omics). Default is the first 8 dimensions for each provided omic.- algorithm
Integer specifying the algorithm for modularity optimization by
Seurat::FindClusters()(1= original Louvain algorithm;2= Louvain algorithm with multilevel refinement;3= SLM algorithm;4= Leiden algorithm). Default is1. RECOMMENDED:4for Leiden algorithm seecluster_seurat()Details section.- knn_seurat
Integer specifying the number of nearest neighbors used for graph construction with
Seurat::Seurat()functionsSeurat::FindNeighbors()(k.param) orSeurat::FindMultiModalNeighbors()(k.nn) (integer). Default is20.- knn_range_seurat
Integer specifying the approximate number of nearest neighbors to compute for
Seurat::FindMultiModalNeighbors()(knn.range) (integer). Default is200.- max_dim
Integer specifying the maximum number of principal components to be used for PCA computation with
stats::prcomp()(integer). Default is200.- quiet
Logical. If
TRUE, suppresses informative messages during execution. Default isFALSE.
Value
A list containing:
- params
List of parameters used for clustering (
list).- pcs
List of principal components summaries for each omic (
listofsummary.prcomp).- nn
Nearest neighbors object (
Neighbor).- graph
Shared nearest neighbors graph (
Graph).- dist
Distance matrix derived from the graph (
matrix).- umap
UMAP coordinates (
matrix).- clusters
A named list of clustering results (vectors of cluster labels) for each value in
res_range(list).
Details
The Leiden algorithm (algorithm = 4) is recommended based on published work and best-practice
guidelines:
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z>
Heumos, L., Schaar, A.C., Lance, C. et al. Best practices for single-cell analysis across modalities. Nat Rev Genet (2023). https://doi.org/10.1038/s41576-023-00586-w https://www.sc-best-practices.org/cellular_structure/clustering.html
Examples
if (FALSE) { # \dontrun{
# Load example muscadet object
# data("muscadet_obj")
# Format input
# transpose matrices to: cells x features matrices
mat_list <- lapply(muscadet::matLogRatio(muscadet_obj), t)
# Run integration & clustering
result <- cluster_seurat(mat_list, res_range = c(0.1, 0.3, 0.5))
# View results
lapply(result$clusters, table)
} # }