
Multi Omics Integration and Clustering on a muscadet
Object
Source: R/clustering.R
clusterMuscadet.Rd
Performs integration of multi omics and clustering of cells based on log
ratio data contained in a muscadet
object.
Usage
clusterMuscadet(
x,
method = c("seurat", "hclust"),
omics = NULL,
knn_imp = 10,
quiet = FALSE,
...
)
Arguments
- x
A
muscadet
object containing omics data and previously computed log R ratio matrices (muscadet
).- method
The clustering method to apply (
character
string). One of"seurat"
or"hclust"
. For medthod"seurat"
, arguments forcluster_seurat()
should be provided, and for method"hclust"
, arguments forcluster_hclust()
should be provided. Note: The"seurat"
method can only be applied for a maximum of 2 omics. Default is"seurat"
.- omics
Optional character vector specifying omic names to use for clustering. Must match names of available omics in the
x
muscadet object. IfNULL
(default), all available omics are used.- knn_imp
Number of k nearest neighbors to use for imputing cluster assignments of cells missing in one or more omics. Only relevant for more than one omic.
- quiet
Logical. If
TRUE
, suppresses informative messages during execution. Default isFALSE
.- ...
Arguments passed on to
cluster_seurat
,cluster_hclust
res_range
A numeric non-negative vector specifying the resolution values to use for
Seurat::FindClusters()
(numeric
vector). Default isc(0.1, 0.2, 0.3, 0.4, 0.5)
.dims_list
A list of vectors of PC dimensions to use for each omic (
list
). Must match the length ofmat_list
(e.g., list(1:8) for 1 omic ; list(1:8, 1:8) for 2 omics). Default is the first 8 dimensions for each provided omic.algorithm
Integer specifying the algorithm for modularity optimization by Seurat::FindClusters (
1
= original Louvain algorithm;2
= Louvain algorithm with multilevel refinement;3
= SLM algorithm;4
= Leiden algorithm). Leiden requires the leidenalg python. Default is1
.knn_seurat
Integer specifying the number of nearest neighbors used for graph construction with Seurat-package functions
Seurat::FindNeighbors()
(k.param
) orSeurat::FindMultiModalNeighbors()
(k.nn
) (integer
). Default is20
.knn_range_seurat
Integer specifying the approximate number of nearest neighbors to compute for
Seurat::FindMultiModalNeighbors()
(knn.range
) (integer
). Default is200
.k_range
A numeric vector of integers (≥2) specifying the cluster numbers (k) to extract from hierarchical clustering (
numeric
vector). Default is from 2 to 10.dist_method
A string specifying the distance method for
Rfast::Dist()
(e.g.,"euclidean"
,"manhattan"
,"cosine"
) (character
string). Default is"euclidean"
.hclust_method
A string specifying the hierarchical clustering linkage method for
fastcluster::hclust()
(e.g.,"ward.D"
,"average"
) (character
string). Default is"ward.D"
.weights
A numeric vector of non-negative values of length equal to the number of omic (internally normalized to sum to 1) (
numeric
vector). It specifies the relatives weights of each omic for SNF withweightedSNF()
. Omics with a weight of 0 will not contribute to the clustering. IfNULL
(default), weights are uniform.
Value
The input muscadet
object with its clustering
slot updated. This slot contains:
- params
List of parameters used for clustering (
list
).- ...
Output objects depending on the method. See
cluster_seurat()
orcluster_hclust()
.- clusters
A named list of cluster partitions (named vectors of cluster labels) for all cells (imputed clusters assignments for non-common cells), for each value in
k_range
orres_range
(list
).- silhouette
A list of silhouette objects and widths for each cluster partition (
list
).- partition.opt
Name of the optimal cluster partition based on maximum average silhouette width.
Details
Two methods are available for integration and clustering of common cells between omics:
Method
seurat
uses nearest neighbors for integration followed by graph-based clustering.Method
hclust
uses Similarity Network Fusion (SNF) for integration followed by hierarchical clustering.
Then, clusters are imputed for cells missing data in at least one omic, by similarity using nearest neighbor cells.
Finally, silhouette widths are computed on the integrated distance matrix to help identify the optimal clustering partition.
See also
Methodology and functionality:
cluster_seurat()
for graph-based clustering using Seurat-package.cluster_hclust()
for hierarchical clustering of SNF-fused distances.weightedSNF()
for weighted Similarity Network Fusion (SNF).imputeClusters()
for imputing cluster labels across omics.
Visualization:
heatmapMuscadet()
to plot clustering result as heatmap.plotSil()
to plot silhouette widths.plotIndexes()
to plot several normalized cluster validation indexes.
Select clusters to continue with CNA calling:
assignClusters()
to assign final cluster assignments in themuscadet
object after cluster partition validation.
Examples
# Load example muscadet object
data(muscadet_obj)
# Perform clustering with "seurat" method
muscadet_obj <- clusterMuscadet(
x = muscadet_obj,
method = "seurat",
res_range = c(0.5, 0.8),
dims_list = list(1:8, 1:8),
knn_seurat = 10, # to adapt for low number of cells in example data
knn_range_seurat = 30 # to adapt for low number of cells in example data
)
#> Clustering method: 'seurat'
#> Performing PCA...
#> Finding neighbors and constructing graph...
#> Computing UMAP...
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session
#> Finding clusters...
#> Imputing clusters...
#> Computing Silhouette scores...
#> Done.
# Perform clustering with "hclust" method
muscadet_obj <- clusterMuscadet(
x = muscadet_obj,
k_range = 2:4,
method = "hclust",
dist_method = "euclidean",
hclust_method = "ward.D",
weights = c(1, 1)
)
#> Clustering method: 'hclust'
#> Computing distance matrices...
#> Computing affinity matrices...
#> Performing SNF integration...
#> Computing UMAP...
#> Performing hierarchical clustering...
#> Imputing clusters...
#> Computing Silhouette scores...
#> Done.
# Retrieve cluster assignments
clusters <- muscadet_obj$clustering$clusters
lapply(clusters, table)
#> $`2`
#>
#> 1 2
#> 56 91
#>
#> $`3`
#>
#> 1 2 3
#> 60 13 74
#>
#> $`4`
#>
#> 1 2 3 4
#> 25 27 13 82
#>