Skip to contents

Performs integration of multi omics and clustering of cells based on log ratio data contained in a muscadet object.

Usage

clusterMuscadet(
  x,
  method = c("seurat", "hclust"),
  omics = NULL,
  knn_imp = 10,
  quiet = FALSE,
  ...
)

Arguments

x

A muscadet object containing omics data and previously computed log R ratio matrices (muscadet).

method

The clustering method to apply (character string). One of "seurat" or "hclust". For medthod "seurat", arguments for cluster_seurat() should be provided, and for method "hclust", arguments for cluster_hclust() should be provided. Note: The "seurat" method can only be applied for a maximum of 2 omics. Default is "seurat".

omics

Optional character vector specifying omic names to use for clustering. Must match names of available omics in the x muscadet object. If NULL (default), all available omics are used.

knn_imp

Number of k nearest neighbors to use for imputing cluster assignments of cells missing in one or more omics. Only relevant for more than one omic.

quiet

Logical. If TRUE, suppresses informative messages during execution. Default is FALSE.

...

Arguments passed on to cluster_seurat, cluster_hclust

res_range

A numeric non-negative vector specifying the resolution values to use for Seurat::FindClusters() (numeric vector). Default is c(0.1, 0.2, 0.3, 0.4, 0.5).

dims_list

A list of vectors of PC dimensions to use for each omic (list). Must match the length of mat_list (e.g., list(1:8) for 1 omic ; list(1:8, 1:8) for 2 omics). Default is the first 8 dimensions for each provided omic.

algorithm

Integer specifying the algorithm for modularity optimization by Seurat::FindClusters (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python. Default is 1.

knn_seurat

Integer specifying the number of nearest neighbors used for graph construction with Seurat-package functions Seurat::FindNeighbors() (k.param) or Seurat::FindMultiModalNeighbors() (k.nn) (integer). Default is 20.

knn_range_seurat

Integer specifying the approximate number of nearest neighbors to compute for Seurat::FindMultiModalNeighbors() (knn.range) (integer). Default is 200.

k_range

A numeric vector of integers (≥2) specifying the cluster numbers (k) to extract from hierarchical clustering (numeric vector). Default is from 2 to 10.

dist_method

A string specifying the distance method for Rfast::Dist() (e.g., "euclidean", "manhattan", "cosine") (character string). Default is "euclidean".

hclust_method

A string specifying the hierarchical clustering linkage method for fastcluster::hclust() (e.g., "ward.D", "average") (character string). Default is "ward.D".

weights

A numeric vector of non-negative values of length equal to the number of omic (internally normalized to sum to 1) (numeric vector). It specifies the relatives weights of each omic for SNF with weightedSNF(). Omics with a weight of 0 will not contribute to the clustering. If NULL (default), weights are uniform.

Value

The input muscadet object with its clustering slot updated. This slot contains:

params

List of parameters used for clustering (list).

...

Output objects depending on the method. See cluster_seurat() or cluster_hclust().

clusters

A named list of cluster partitions (named vectors of cluster labels) for all cells (imputed clusters assignments for non-common cells), for each value in k_range or res_range (list).

silhouette

A list of silhouette objects and widths for each cluster partition (list).

partition.opt

Name of the optimal cluster partition based on maximum average silhouette width.

Details

Two methods are available for integration and clustering of common cells between omics:

  • Method seurat uses nearest neighbors for integration followed by graph-based clustering.

  • Method hclust uses Similarity Network Fusion (SNF) for integration followed by hierarchical clustering.

Then, clusters are imputed for cells missing data in at least one omic, by similarity using nearest neighbor cells.

Finally, silhouette widths are computed on the integrated distance matrix to help identify the optimal clustering partition.

See also

Methodology and functionality:

Visualization:

Select clusters to continue with CNA calling:

  • assignClusters() to assign final cluster assignments in the muscadet object after cluster partition validation.

Examples

# Load example muscadet object
data(muscadet_obj)

# Perform clustering with "seurat" method
muscadet_obj <- clusterMuscadet(
  x = muscadet_obj,
  method = "seurat",
  res_range = c(0.5, 0.8),
  dims_list = list(1:8, 1:8),
  knn_seurat = 10, # to adapt for low number of cells in example data
  knn_range_seurat = 30 # to adapt for low number of cells in example data
)
#> Clustering method: 'seurat'
#> Performing PCA...
#> Finding neighbors and constructing graph...
#> Computing UMAP...
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session
#> Finding clusters...
#> Imputing clusters...
#> Computing Silhouette scores...
#> Done.

# Perform clustering with "hclust" method
muscadet_obj <- clusterMuscadet(
  x = muscadet_obj,
  k_range = 2:4,
  method = "hclust",
  dist_method = "euclidean",
  hclust_method = "ward.D",
  weights = c(1, 1)
)
#> Clustering method: 'hclust'
#> Computing distance matrices...
#> Computing affinity matrices...
#> Performing SNF integration...
#> Computing UMAP...
#> Performing hierarchical clustering...
#> Imputing clusters...
#> Computing Silhouette scores...
#> Done.

# Retrieve cluster assignments
clusters <- muscadet_obj$clustering$clusters
lapply(clusters, table)
#> $`2`
#> 
#>  1  2 
#> 56 91 
#> 
#> $`3`
#> 
#>  1  2  3 
#> 60 13 74 
#> 
#> $`4`
#> 
#>  1  2  3  4 
#> 25 27 13 82 
#>