This function imputes cluster assignments for cells missing in some omics by leveraging nearest neighbor cells in other omic matrices.
Arguments
- mat_list
A named list of log ratio cells x features matrices where each matrix corresponds to a single omic dataset (
list
). Rows are cells, and columns are features.- clusters
A named vector of cluster assignments for cells (
numeric
orcharacter
vector). The vector names must correspond to the names of the common cells across omics (matching row names inmat_list
). The clusters names can be as integer, numeric or character values.- knn_imp
Integer specifying the number of nearest neighbors cells to use for imputation (
integer
). Must be a positive integer. Default is10
.
Value
A named vector of combining the original clusters assignments for
common cells across omics (given by the clusters
argument) and the
imputed cluster assignments for cells missing in at least one omic matrix.
Details
The function operates in the following steps:
Identifies cells missing in specific matrices.
Finds the k-nearest neighbors for missing cells in matrices where they are present.
Imputes cluster assignments for missing cells based on the clusters assigned to their neighbors.
Resolves ties (two major clusters found among neighbors) by selecting the one of the first nearest neighbor.
The imputation is performed separately for each omic dataset, and the results are aggregated to provide final cluster assignments.
Examples
# Create matrices with some cells missing in one or the other
set.seed(42)
mat1 <- matrix(runif(100), nrow = 20)
mat2 <- matrix(runif(100), nrow = 20)
rownames(mat1) <- paste0("Cell", 1:20)
rownames(mat2) <- paste0("Cell", c(1:5, 11:25))
mat_list <- list(ATAC = mat1, RNA = mat2)
# Create cluster assignments for common cells
common_cells <- intersect(rownames(mat1), rownames(mat2))
clusters <- setNames(sample(1:4, length(common_cells), replace = TRUE), common_cells)
# Check the inputs
print(common_cells)
#> [1] "Cell1" "Cell2" "Cell3" "Cell4" "Cell5" "Cell11" "Cell12" "Cell13"
#> [9] "Cell14" "Cell15" "Cell16" "Cell17" "Cell18" "Cell19" "Cell20"
print(rownames(mat_list$ATAC))
#> [1] "Cell1" "Cell2" "Cell3" "Cell4" "Cell5" "Cell6" "Cell7" "Cell8"
#> [9] "Cell9" "Cell10" "Cell11" "Cell12" "Cell13" "Cell14" "Cell15" "Cell16"
#> [17] "Cell17" "Cell18" "Cell19" "Cell20"
print(rownames(mat_list$RNA))
#> [1] "Cell1" "Cell2" "Cell3" "Cell4" "Cell5" "Cell11" "Cell12" "Cell13"
#> [9] "Cell14" "Cell15" "Cell16" "Cell17" "Cell18" "Cell19" "Cell20" "Cell21"
#> [17] "Cell22" "Cell23" "Cell24" "Cell25"
# Impute cluster assignments for missing cells
imputed_clusters <- imputeClusters(mat_list, clusters, knn_imp = 3)
# View the imputed cluster assignments
print(imputed_clusters)
#> Cell1 Cell2 Cell3 Cell4 Cell5 Cell11 Cell12 Cell13 Cell14 Cell15 Cell16
#> 4 2 1 4 3 1 2 2 2 4 2
#> Cell17 Cell18 Cell19 Cell20 Cell10 Cell6 Cell7 Cell8 Cell9 Cell21 Cell22
#> 3 4 3 2 2 2 2 1 2 4 1
#> Cell23 Cell24 Cell25
#> 2 2 3