Copy Number Alteration (CNA) Calling from muscadet object

Performs copy number alteration (CNA) analysis on a muscadet object by processing allelic and coverage counts across clusters and evaluating cell fractions and copy numbers.

Usage

cnaCalling(
  x,
  omics.coverage = NULL,
  depthmin.a.clusters = 30,
  depthmin.c.clusters = 50,
  depthmin.a.allcells = 30,
  depthmin.c.allcells = 50,
  depthmin.nor = 5,
  depthmax.nor = 1000,
  het.thresh = 0.25,
  snp.nbhd = 250,
  hetscale = TRUE,
  cval1 = 25,
  cval2 = 150,
  min.nhet = 5,
  clonal.thresh = 0.9,
  dist.breakpoints = 1e+06,
  ploidy = "auto",
  quiet = FALSE
)

Source

This function uses several functions from the facets-package package, including: facets::clustersegs(), facets::emcncf(), facets::findDiploidLogR(), facets::fitcncf(), facets::procSample(), facets::procSnps(), and adapted function preProcSample2().

Seshan VE, Shen R (2021). facets: Cellular Fraction and Copy Numbers from Tumor Sequencing. R package version 0.6.2, https://github.com/mskcc/facets.

Arguments

x

A muscadet object. Must contain:

Clustering assignments in the cnacalling$clusters slot (use assignClusters()).
Combined allelic and coverage counts per cluster in the cnacalling$combined.counts slot (use mergeCounts()).

omics.coverage

A vector of omics names to select for coverage log R ratio data. RECOMMENDED: select "ATAC" when ATAC and RNA omics are available, the ATAC coverage (DNA) signal is less noisy than RNA signal. By default, NULL selects all available data.

depthmin.a.clusters

Minimum allelic depth per clusters in tumor cells (default: 30).

depthmin.c.clusters

Minimum coverage depth per clusters in tumor cells (default: 50).

depthmin.a.allcells

Minimum allelic depth for all tumor cells (default: 30).

depthmin.c.allcells

Minimum coverage depth for all tumor cells (default: 50).

depthmin.nor

Minimum coverage depth for normal sample (default: 0).

depthmax.nor

Maximum coverage depth for normal sample (default: 1000).

het.thresh

VAF (Variant Allele Frequency) threshold to call variant positions heterozygous for preProcSample2() (default: 0.25).

snp.nbhd

Window size for selecting SNP loci to reduce serial correlation for preProcSample2() (default: 250).

hetscale

Logical value indicating whether log odds ratio (logOR) should be scaled to give more weight in the test statistics for segmentation and clustering preProcSample2(). (default: TRUE)

cval1

Critical value for segmentation for preProcSample2() (default: 25).

cval2

Critical value for segmentation for facets::procSample() (default: 150).

min.nhet

Minimum number of heterozygous positions in a segment for facets::procSample() and facets::emcncf() (default: 5).

clonal.thresh

Threshold of minimum cell proportion to label a segment as clonal (default: 0.9).

dist.breakpoints

Minimum distance between breakpoints to define distinct segments (default: 1e6).

ploidy

Specifies ploidy assumption: "auto", "median", or numeric value (default: "auto").

quiet

Logical. If TRUE, suppresses informative messages during execution. Default is FALSE.

Value

A modified muscadet object with added CNA analysis results in the cnacalling slot, including: filtered counts and positions, segmentation data for clusters and all cells, consensus segments across clusters based on breakpoints, diploid log R ratio, purity and ploidy.

Details of the cnacalling slot:

combined.counts.filtered: Filtered counts per clusters.
combined.counts.allcells: Counts summed for all cells (no cluster distinction).
combined.counts.allcells.filtered: Filtered counts summed for all cells (no cluster distinction).
positions: Data frame of positions from the per cluster analysis. Positions in rows and associated data in columns: chrom, maploc (position), rCountT (read count in tumor), rCountN (read count in normal), vafT (variant allele frequency in tumor), vafN (variant allele frequency in normal), cluster (cluster id), signal (whether the counts come from coverage or allelic data), het (heterozygous status), keep (whether to keep position), gcpct (GC percentage), gcbias (GC bias correction), cnlr (log R ratio), valor (log odds ratio), lorvar (variance of log odds ratio), seg0, seg_ori (segment original id within each cluster), seg (segment id), segclust (cluster of segments id), vafT.allcells (vairiant allele frequency in all tumor cells), colVAR (integer for allelic position color depending on vafT.allcells).
segments: Data frame of segments from the per cluster analysis. Segments in rows and associated data in columns: chrom, seg (segment id), num.mark (number of positions in segment), nhet (number of heterezygous positions in segment), cnlr.median (segment log R ratio median), mafR (segment square of expected log odds ratio), vafT.median (segment variant allele frequency median), cluster (cluster id), seg_ori (segment original id within each cluster), segclust (cluster of segments id), cnlr.median.clust (segment cluster log R ratio median), mafR.clust (segment cluster square of expected log odds ratio), cf (cell fraction), tcn (total copy number), lcn (lower copy number), start, end, cf.em (cell fraction computed with EM algorithm), tcn.em, (total copy number computed with EM algorithm), lcn.em (lower copy number computed with EM algorithm).
positions.allcells: Same as positions but from the all cells analysis.
segments.allcells: Same as segments but from the all cells analysis.
consensus.segs: Data frame of unique consensus segments across clusters, with the cna (logical) and cna_clonal (logical) information.
table: Data frame of consensus segments across clusters with associated information per cluster in columns: chrom, start, end, id, cluster, cf.em (cell fraction computed with EM algorithm), tcn.em (total copy number computed with EM algorithm), lcn.em (lower copy number computed with EM algorithm), ncells (number of cells in cluster), prop.cluster (proportion of cells per cluster), gnl (gain;neutral;loss : 1;0;-1), loh (loss of heterozygosity status), state (state of segments), cna (whether the segment is a CNA), cna_state (state of CNA segments), prop.tot (proportion of cells with the same state per segment), state_clonal (state of the segment if its prop.tot is above clonal.thresh), cna_clonal (whether the segment is a clonal CNA), cna_clonal_state (state of clonal CNA segments).
ncells: Vector of number of cells per cluster.
dipLogR.clusters: Diploid log R ratio from the per cluster analysis.
dipLogR.allcells: Diploid log R ratio from the all cells analysis.
purity.clusters: Purity from the per cluster analysis.
purity.allcells: Purity from the all cells analysis.
ploidy.clusters: Ploidy from the per cluster analysis.
ploidy.allcells: Ploidy from the all cells analysis.

References

Shen R, Seshan VE. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 2016 Sep 19;44(16):e131. doi: 10.1093/nar/gkw520.

Examples

library("facets")
#> Loading required package: pctGCdata

# Load example muscadet object
data(muscadet_obj)

muscadet_obj <- cnaCalling(muscadet_obj,
                           omics.coverage = "ATAC",
                           depthmin.a.clusters = 3, # set low thresholds for example data
                           depthmin.c.clusters = 5,
                           depthmin.a.allcells = 3,
                           depthmin.c.allcells = 5,
                           depthmin.nor = 0)
#> Selecting coverage data from omic(s): ATAC
#> Filtering positions per clusters based on provided filters...
#> Performing segmentation per cluster...
#> Finding diploid log R ratio on clusters...
#> Diploid log R ratio = -0.566520511224907
#> Computing cell fractions and copy numbers on clusters...
#> Filtering positions on all cells based on provided filters...
#> Performing segmentation on all cells...
#> Computing cell fractions and copy numbers on all cells...
#> Finding consensus segments between clusters...