Compute log R ratios from raw count matrices with method specifically adapted for scATAC-seq data. The counts per peaks are grouped into counts per windows of peaks, thereby features become windows of peaks instead of peaks.
Usage
computeLogRatioATAC(
matTumor,
matRef,
peaksCoord,
genome = "hg38",
windowSize = 1e+07,
slidingSize = 2e+06,
minReads = 5,
minPeaks = 100,
thresh_capping = 3,
all_steps = FALSE,
quiet = FALSE
)
Arguments
- matTumor
Raw count matrix features x cells for tumor/sample cells (
matrix
ordgCMatrix
).- matRef
Raw count matrix features x cells for reference cells (
matrix
ordgCMatrix
).- peaksCoord
Data frame of peak coordinates with columns
CHROM
,start
,end
,id
(data.frame
).- genome
Reference genome name among: "hg38", "hg19" and "mm10" (
character
string). By default: "hg38".- windowSize
Size of windows in base pairs (
integer
value). By default:10e6
(10 Mbp).- slidingSize
Distance between start positions of sliding windows in base pairs (
integer
value). If set to the same value aswindowSize
, the windows don't overlap. By default:2e6
(2 Mbp).- minReads
Minimum read average per window in reference cells (
integer
value). By default:5
.- minPeaks
Minimum number of peaks per window (
integer
value). By default:100
.- thresh_capping
Threshold to cap the range of log R ratio values (
numeric
value). By default:3
.- all_steps
TRUE
orFALSE
(logical
). Whether to keep intermediate result from every step in the final object. By default:FALSE
.- quiet
Logical. If
TRUE
, suppresses informative messages during execution. Default isFALSE
.
Value
If all_steps
is set to FALSE
, a list containing:
matTumor
Matrix of log R ratio values features x cells for the tumor/sample cells (
matrix
).matRef
Matrix of log R ratio values features x cells for the reference cells (
matrix
).params
List of parameters set for the
windowSize
,slidingSize
,minReads
,minPeaks
andthresh_capping
arguments (list
).coord
Data frame of coordinates for windows of peaks and associated data along the different steps (
data.frame
). Columns:CHROM
,start
,end
,width
,id
: coordinates and unique identifier of windows (depends onwindowSize
andslidingSize
arguments).nPeaks
: number of peaks per window.sumReads.tum/ref
: sum of read counts for all cells in tumor or reference cells.meanReads.tum/ref
: mean of read counts per cells for tumor or reference cells.sdReads.tum/ref
: standard deviation of read counts per cells for tumor or reference cells.keep
: logical,TRUE
for windows to keep after filtering based on coverage (depends onminPeaks
andmeanReads
arguments).meanReads/sdReads.norm.tum/ref
: mean/sd of normalized counts per million for tumor/reference cells.meanLRR/sdReads.raw.tum/ref
: mean/sd of raw log R ratio (LRR) for tumor/reference cells.meanLRR/sdLRR.cap.tum/ref
: mean/sd of capped log R ratio (LRR) for tumor/reference cells (depends onthresh_capping
argument).meanLRR/sdLRR.cent.tum/ref
: mean/sd of centered log R ratio (LRR) for tumor/reference cells.meanLRR/sdLRR.corr.tum/ref
: mean/sd of final log R ratio (LRR) corrected by reference variability for tumor/reference cells.
If all_steps is set to TRUE, the previously described list can be found for
each step in a different list element.
Details
The raw count matrix is transformed into log R ratios through the following steps:
Group peaks in windows (
windowSize
andslidingSize
arguments)Filtering on coverage (
minPeaks
andmeanReads
arguments)Normalization for sequencing depth
Log transformation and normalization by reference data: log R ratio
Capping the range of values (
thresh_capping
argument)Centering of cells
Correcting by reference variability
See also
Other computeLogRatio:
computeLogRatio()
,
computeLogRatioRNA()
Examples
# Create muscomic objects
atac <- CreateMuscomicObject(
type = "ATAC",
mat_counts = mat_counts_atac_tumor,
allele_counts = allele_counts_atac_tumor,
features = peaks
)
rna <- CreateMuscomicObject(
type = "RNA",
mat_counts = mat_counts_rna_tumor,
allele_counts = allele_counts_rna_tumor,
features = genes
)
atac_ref <- CreateMuscomicObject(
type = "ATAC",
mat_counts = mat_counts_atac_ref,
allele_counts = allele_counts_atac_ref,
features = peaks
)
rna_ref <- CreateMuscomicObject(
type = "RNA",
mat_counts = mat_counts_rna_ref,
allele_counts = allele_counts_rna_ref,
features = genes
)
# Create muscadet objects
muscadet <- CreateMuscadetObject(
omics = list(atac, rna),
bulk.lrr = bulk_lrr,
bulk.label = "WGS",
genome = "hg38"
)
muscadet_ref <- CreateMuscadetObject(
omics = list(atac_ref, rna_ref),
genome = "hg38"
)
# Compute log R ratio for ATAC
obj_atac <- computeLogRatioATAC(
matTumor = matCounts(muscadet)$ATAC,
matRef = matCounts(muscadet_ref)$ATAC,
peaksCoord = coordFeatures(muscadet)$ATAC,
genome = slot(muscadet, "genome"),
minReads = 1, # low value for example subsampled datasets
minPeaks = 1 # low value for example subsampled datasets
)
#> Step 01 - Group peaks in windows: window size set at 10 Mb, sliding by 2 Mb
#> Step 02 - Filtering windows: Minimum of 1 peaks per window with a minimum average of 1 read(s)
#> Step 03 - Normalization for sequencing depth: Normalized counts per million
#> Step 04 - Log transformation and normalization by reference data: log R ratio
#> Step 05 - Capping the range of values: threshold = 3
#> Step 06 - [No step 06 for scATAC-seq]
#> Step 07 - Centering of cells
#> Step 08 - Correcting by reference variability
table(obj_atac$coord$keep)
#>
#> FALSE TRUE
#> 1329 133
# With results form every step when `all_steps = TRUE`
obj_atac_all <- computeLogRatioATAC(
matTumor = matCounts(muscadet)$ATAC,
matRef = matCounts(muscadet_ref)$ATAC,
peaksCoord = coordFeatures(muscadet)$ATAC,
genome = slot(muscadet, "genome"),
minReads = 1, # low value for example subsampled datasets
minPeaks = 1, # low value for example subsampled datasets
all_steps = TRUE
)
#> Step 01 - Group peaks in windows: window size set at 10 Mb, sliding by 2 Mb
#> Step 02 - Filtering windows: Minimum of 1 peaks per window with a minimum average of 1 read(s)
#> Step 03 - Normalization for sequencing depth: Normalized counts per million
#> Step 04 - Log transformation and normalization by reference data: log R ratio
#> Step 05 - Capping the range of values: threshold = 3
#> Step 06 - [No step 06 for scATAC-seq]
#> Step 07 - Centering of cells
#> Step 08 - Correcting by reference variability
names(obj_atac_all)
#> [1] "step01" "step02" "step03" "step04" "step05" "step07" "step08" "params"
#> [9] "coord"
table(obj_atac_all$coord$keep)
#>
#> FALSE TRUE
#> 1329 133
nrow(obj_atac_all$step08$matTumor)
#> [1] 133