Function adapted from facets::preProcSample()
to process read count matrix
and generates a segmentation tree. The modifications from the original
function includes:
Incorporation of
cluster
andsignal
columns into the final result.Change in the log ratio (
cnlr
) for allelic data, it is computed as the log ratio (cnlr
) mean between the previous and next coverage positions.
Usage
preProcSample2(
rcmat,
het.thresh = 0.25,
snp.nbhd = 250,
cval = 25,
gbuild = "hg38",
hetscale = TRUE,
ndepth = 5,
ndepthmax = 1000
)
Source
This function is derived from the facets::preProcSample()
function,
with the modifications to fit its use in muscadet.
Seshan VE, Shen R (2021). facets: Cellular Fraction and Copy Numbers from Tumor Sequencing. R package version 0.6.2, https://github.com/mskcc/facets.
Arguments
- rcmat
A data frame with 8 required columns:
Chrom
,Pos
,NOR.DP
,NOR.RD
,TUM.DP
,TUM.RD
,cluster
, andsignal
(data.frame
).- het.thresh
VAF (Variant Allele Frequency) threshold to call variant positions heterozygous (
numeric
). Default: 0.25.- snp.nbhd
Window size for selecting loci to reduce serial correlation (
numeric
). Default: 250.- cval
Critical value for segmentation (
numeric
). Default: 25.- gbuild
Genome build used for alignment. One of
"hg19"
,"hg38"
, or"mm10"
(character
). Default:"hg19"
.- hetscale
Logical value indicating whether log odds ratio (logOR) should be scaled to give more weight in the test statistics for segmentation and clustering (
logical
). Usually only 10% of snps are hets and hetscale gives the logOR contribution to T-square as 0.25/proportion of hets. Default:TRUE
.- ndepth
Minimum depth in normal reference to keep (
numeric
). Default: 5.- ndepthmax
Maximum normal coverage threshold for filtering loci (
numeric
). Default: 1000.
Value
A list containing:
- pmat
Read counts and other elements of all the loci.
- gbuild
Genome build used for the analysis.
- nX
Chromosome number for X (e.g., 23 for human, 20 for mouse).
- clusters
Unique clusters from the processed data.
- seg.tree
Segmentation tree for each chromosome.
- jointseg
Segmented variant positions data.
- hscl
Scaling factor for logOR data.
Details
The function processes variant positions data to generate a segmentation
tree. It uses procSnps
to compute initial values,
adjusts the log ratio for allelic signals, and computes segmentation using
segsnps
.
SNPs (or variants) in a genome are not evenly spaced, and loci are sampled
within the specified window (snp.nbhd
) to reduce serial correlation.
References
- facets-package package
Shen R, Seshan VE. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 2016 Sep 19;44(16):e131. doi: 10.1093/nar/gkw520. PMID: 27270079; PMCID: PMC5027494.
Examples
library("facets")
# Load example muscadet object
data(muscadet_obj)
counts <- muscadet_obj$cnacalling$combined.counts
counts <- counts[complete.cases(counts),]
counts_clus <- counts[which(counts$cluster == 1),]
result <- preProcSample2(counts_clus)