Compute log R ratios from raw count matrices with method specifically adapted for scRNA-seq data.
Usage
computeLogRatioRNA(
matTumor,
matRef,
genesCoord,
genome = "hg38",
genesPerWindow = 101,
refReads = 100,
refMeanReads = 0.01,
thresh_capping = 3,
all_steps = FALSE,
quiet = FALSE
)Arguments
- matTumor
Raw count matrix features x cells for tumor/sample cells (
matrixordgCMatrix).- matRef
Raw count matrix features x cells for reference cells (
matrixordgCMatrix).- genesCoord
Data frame of gene coordinates with columns
CHROM,start,end,id(data.frame).- genome
Reference genome name among: "hg38", "hg19" and "mm10" (
character). By default: "hg38".- genesPerWindow
Number of genes per moving window (
integervalue). By default:101.- refReads
Minimum of reads in reference cells (
integervalue). By default:100.- refMeanReads
Minimum of average reads per reference cell (
integervalue). By default:0.01.- thresh_capping
Threshold to cap the range of log R ratio values (
numericvalue). By default:3.- all_steps
TRUEorFALSE(logical). Whether to keep intermediate result from every step in the final object. By default:FALSE.- quiet
Logical. If
TRUE, suppresses informative messages during execution. Default isFALSE.
Value
If all_steps is set to FALSE, a list containing:
matTumorMatrix of log R ratio values features x cells for the tumor/sample cells (
matrix).matRefMatrix of log R ratio values features x cells for the reference cells (
matrix).paramsList of parameters set for the
genesPerWindow,refMeans,refMeanReadsandthresh_cappingarguments (list).coordData frame of coordinates for windows of peaks and associated data along the different steps (
data.frame). Columns :CHROM,start,end,id: coordinates and name of genes.sumReads.tum/ref: sum of read counts for all cells in tumor or reference cells.meanReads.tum/ref: mean of read counts per cells for tumor or reference cells.sdReads.tum/ref: standard deviation of read counts per cells for tumor or reference cells.keep: logical,TRUEfor genes to keep after filtering based on reference coverage (depends onrefReadsandrefMeanReadsarguments).meanReads/sdReads.norm.tum/ref: mean/sd of normalized counts per million for tumor/reference cells.meanLRR/sdReads.raw.tum/ref: mean/sd of raw log R ratio (LRR) for tumor/reference cells.meanLRR/sdLRR.cap.tum/ref: mean/sd of capped log R ratio (LRR) for tumor/reference cells (depends onthresh_cappingargument).meanLRR/sdLRR.smoo.tum/ref: mean/sd of smoothed log R ratio (LRR) for tumor/reference cells (means of moving windows defined by thegenesPerWindowargument).meanLRR/sdLRR.cent.tum/ref: mean/sd of centered log R ratio (LRR) for tumor/reference cells.meanLRR/sdLRR.corr.tum/ref: mean/sd of final log R ratio (LRR) corrected by reference variability for tumor/reference cells.
If all_steps is set to TRUE, the previously described list can be found
for each step in a list element.
Details
The raw count matrix is transformed into log R ratios through the following steps:
Match genes in count matrix with coordinates
Filtering on coverage (
refReadsandrefMeanReadsarguments)Normalization for sequencing depth
Log transformation and normalization by reference data: log R ratio
Capping the range of values (
thresh_cappingargument)Smoothing on genes windows
Centering of cells
Correcting by reference variability
See also
Other computeLogRatio:
computeLogRatio(),
computeLogRatioATAC()
Examples
# Create muscomic objects
atac <- CreateMuscomicObject(
type = "ATAC",
mat_counts = mat_counts_atac_tumor,
allele_counts = allele_counts_atac_tumor,
features = peaks
)
rna <- CreateMuscomicObject(
type = "RNA",
mat_counts = mat_counts_rna_tumor,
allele_counts = allele_counts_rna_tumor,
features = genes
)
atac_ref <- CreateMuscomicObject(
type = "ATAC",
mat_counts = mat_counts_atac_ref,
allele_counts = allele_counts_atac_ref,
features = peaks
)
rna_ref <- CreateMuscomicObject(
type = "RNA",
mat_counts = mat_counts_rna_ref,
allele_counts = allele_counts_rna_ref,
features = genes
)
# Create muscadet objects
muscadet <- CreateMuscadetObject(
omics = list(atac, rna),
bulk.lrr = bulk_lrr,
bulk.label = "WGS",
genome = "hg38"
)
muscadet_ref <- CreateMuscadetObject(
omics = list(atac_ref, rna_ref),
genome = "hg38"
)
# Compute log R ratio for RNA
obj_rna <- computeLogRatioRNA(
matTumor = matCounts(muscadet)$RNA,
matRef = matCounts(muscadet_ref)$RNA,
genesCoord = coordFeatures(muscadet)$RNA,
genome = slot(muscadet, "genome"),
refReads = 2 # low value for example subsampled datasets
)
#> Step 01 - Match genes in count matrix with coordinates
#> Step 02 - Filtering genes: Minimum of 2 read(s) in reference cells and minimum of 0.01 read(s) in average per reference cell
#> Step 03 - Normalization for sequencing depth: Normalized counts per million
#> Step 04 - Log transformation and normalization by reference data: log R ratio
#> Step 05 - Capping the range of values: threshold = 3
#> Step 06 - Smoothing values on gene windows: 101 genes per window
#> Step 07 - Centering of cells
#> Step 08 - Correcting by reference variability
table(obj_rna$coord$keep)
#>
#> FALSE TRUE
#> 151 349
# With results form every step when `all_steps = TRUE`
obj_rna_all <- computeLogRatioRNA(
matTumor = matCounts(muscadet)$RNA,
matRef = matCounts(muscadet_ref)$RNA,
genesCoord = coordFeatures(muscadet)$RNA,
genome = slot(muscadet, "genome"),
refReads = 2, # low value for example subsampled datasets
all_steps = TRUE
)
#> Step 01 - Match genes in count matrix with coordinates
#> Step 02 - Filtering genes: Minimum of 2 read(s) in reference cells and minimum of 0.01 read(s) in average per reference cell
#> Step 03 - Normalization for sequencing depth: Normalized counts per million
#> Step 04 - Log transformation and normalization by reference data: log R ratio
#> Step 05 - Capping the range of values: threshold = 3
#> Step 06 - Smoothing values on gene windows: 101 genes per window
#> Step 07 - Centering of cells
#> Step 08 - Correcting by reference variability
names(obj_rna_all)
#> [1] "step01" "step02" "step03" "step04" "step05" "step06" "step07" "step08"
#> [9] "params" "coord"
table(obj_rna_all$coord$keep)
#>
#> FALSE TRUE
#> 151 349
nrow(obj_rna_all$step08$matTumor)
#> [1] 349
