ASCAT – Allele-Specific Copy number Analysis of Tumours

Welcome to the ASCAT webpage!

ASCAT is a method to derive copy number profiles of tumour cells, accounting for normal cell admixture and tumour aneuploidy (Figure 1). ASCAT infers tumour purity (the fraction of tumour cells) and ploidy (the amount of DNA per tumour cell, expressed as multiples of haploid genomes) from SNP array or massively parallel sequencing data, and calculates whole-genome allele-specific copy number profiles (the number of copies of both parental alleles for all SNP loci across the genome).

Figure 1: Copy number analysis of cancer genomes using the ASCAT tool. ASCAT infers tumour purity and ploidy, allowing detailed inference of copy number profiles, LOH and homozygous deletions.

Figure 1: Copy number analysis of cancer genomes using the ASCAT tool. ASCAT infers tumour purity and ploidy, allowing detailed inference of copy number profiles, LOH and homozygous deletions. (Click to view larger image)

 

 

 

 

 

 

 

 

 

 




ASCAT 2

The latest ASCAT version (2.4) is available as an R package here. If you use ASCAT in your work, please cite us: Peter Van Loo, Silje H. Nordgard, Ole Christian Lingjærde, Hege G. Russnes, Inga H. Rye, Wei Sun, Victor J. Weigman, Peter Marynen, Anders Zetterberg, Bjørn Naume, Charles M. Perou, Anne-Lise Børresen-Dale, and Vessela N. Kristensen. Proceedings of the National Academy of Sciences of the USA, 107:16910-16915, 2010.

An example pipeline can be found within the R package, here. Simulated example data can be found here.

In its simplest form (with matched normal data available, without GC wave correction and all samples female), ASCAT can be run as follows:

  • library(ASCAT)
    ascat.bc = ascat.loadData("Tumor_LogR.txt","Tumor_BAF.txt","Germline_LogR.txt", "Germline_BAF.txt")
    ascat.plotRawData(ascat.bc)
    ascat.bc = ascat.aspcf(ascat.bc)
    ascat.plotSegmentedData(ascat.bc)
    ascat.output = ascat.runAscat(ascat.bc) 

The ascat.loadData function by default assumes all samples are female. An extra optional parameter (gender = …) allows setting the gender of samples (in vector format, using "XX" for females and "XY" for males). 

Running ASCAT without matched normal data

ASCAT can also run without matched normal data, and can infer the necessary germline genotypes from the tumour data:

  • library(ASCAT)
    ascat.bc = ascat.loadData("Tumor_LogR.txt","Tumor_BAF.txt")
    ascat.plotRawData(ascat.bc)
    ascat.gg = ascat.predictGermlineGenotypes(ascat.bc, PLATFORM) 
    ascat.bc = ascat.aspcf(ascat.bc,ascat.gg=ascat.gg) 
    ascat.plotSegmentedData(ascat.bc)
    ascat.output = ascat.runAscat(ascat.bc) 

The germline genotype prediction function needs the used SNP array platform as input. Currently supported platforms are: "Affy10k", "Affy100k", "Affy250k_sty", "Affy250k_nsp", "Affy500k", "AffySNP6", "AffyOncoScan", "AffyCytoScanHD", "Illumina109k", "IlluminaCytoSNP", "Illumina610k", "Illumina660k", "Illumina700k", "Illumina1M" and "Illumina2.5M", "IlluminaOmni5". For the example data above, a "Custom10k" platform is also included. If you would like to run ASCAT without matched normal data on another platform, and are prepared to send over some sample data, please contact us. 

GC wave correction

Samples profiled through SNP arrays or massively parallel sequencing are often affected by 'wave artefacts' that are in part correlated with GC content of the surrounding region (see e.g. this paper by Diskin et al.). We've implemented GC wave correction in ASCAT, and recommend adding that step to the pipeline if the input data hasn't been through alternative methods for GC correction.

  • library(ASCAT)

    ascat.bc = ascat.loadData("Tumor_LogR.txt","Tumor_BAF.txt","Germline_LogR.txt", "Germline_BAF.txt")

    ascat.bc = ascat.GCcorrect(ascat.bc, GC_CONTENT_FILE)

    ascat.plotRawData(ascat.bc) 

    ascat.plotRawData(ascat.bc)

    ascat.bc = ascat.aspcf(ascat.bc)

    ascat.plotSegmentedData(ascat.bc)

    ascat.output = ascat.runAscat(ascat.bc) 

Our GC correction method is based on the one initially implemented in the following paper: Jiqiu Cheng, Evelyne Vanneste, Peter Konings, Thierry Voet, Joris R. Vermeesch, and Yves Moreau. Genome Biology 12:R80, 2011.

This requires a GC content file for the platform used. We currently provide GC content files for Affymetrix SNP 6.0 arrays, Affymetrix 250k STY, Illumina 660k arrays and Illumina OmniExpress arrays. If you would like to run ASCAT GC correction on another platform, please contact us. 

Input data formats and supported platforms

ASCAT is platform and species independent, and works for both Illumina and Affymetrix SNP arrays, as well as for massively parallel sequencing data. The input required includes matrices of LogR and B Allele Frequency (BAF) data (rows are probes or SNP loci and columns are samples). ASCAT requires identically formatted LogR and BAF files for both tumour and germline data (with matching samples on matching rows in all four files). For examples of the precise data format, see our simulated example data. 

Input data for ASCAT can be obtained directly from Illumina GenomeStudio, or can be derived from Affymetrics CEL files, e.g. through the PennCNV libraries. The pipeline we use (and recommend) for Affymetrix SNP 6.0 arrays can be found within the R package, herePlease note that you need two adapted files for this pipeline, one containing the SNP locations for the AffySNP6 platform and a genotype cluster file that was compiled from a series of about 5,000 verified normal samples.

ASCAT input files can also be derived from massively parallel sequencing data, through log-transformed normalised read depth (LogR) and allele frequencies (BAF) of selected SNP loci. A complete whole-genome pipeline that uses SNP loci from the Affymetrix SNP 6.0 arrays can be found here. We are working on an integrated and automated whole genome, exome and targeted sequencing version that runs directly from BAM files.

ASCAT can also be run on data from other species, for example SNP arrays from canine breast cancers or exomes from zebrafish melanomas. As the method leverages SNP loci, it will however not work on haploid or homozygous (inbred) species (e.g. inbred mouse strains).

An important platform- and normalisation-specific parameter is the normalisation parameter (gamma) within the function ascat.runAscat. The parameter represents the drop in LogR for a change from two copies to one copy in 100% of cells. Gamma theoretically should equal 1, but due to array background signal and bespoke array normalisation procedures, in practice it is often significantly lower. Its default setting of 0.55 works for many but not all SNP arrays (e.g. Illumina 109k arrays as processed through BeadStudio/GenomeStudio and Affymetrix SNP 6.0 arrays processed through the PennCNV libraries). For other SNP array platforms (and normalisation procedures), we recommend checking the value of gamma through comparison of a male and female germline sample (evaluating the difference in LogR values of the X chromosome probes between genders, relative to the rest of the genome), or through an X chromosome titration series. For massively parallel sequencing data, gamma should always be set to 1. 

ASCAT outputs

The output of ASCAT and how to interpret it, is described in this book chapter. 

Legacy versions and data

Historic versions of ASCAT are available as part of our GitHub version, here. We recommend to always use the latest version, but we provide the historic versions for legacy reasons.

Major changes to ASCAT over the original version 1.0 are:

  • Availability as an easy-to-use and coherent R software suite (2.0)
  • Major improvements in computational speed (2.0)
  • Platform-independence (2.0)
  • Update of the core algorithm for better performance and results (2.0 and 2.2)
  • Addition of germline genotype prediction and thereby extension to unmatched tumour samples (2.0)
  • Adaptations to the ASPCF segmentation algorithm to increase sensitivity in samples with low noise and to increase robustness in more noisy samples (2.1)
  • Addition of a gender parameter, allowing correct handling of copy number aberrations on the X chromosome in male samples (2.2)
  • Addition of GC correction code (2.2)
  • Adaptations to allow manual refitting of samples (2.3)
  • Adaptations and additions to output data structures (2.3)
  • Availability as R package (2.4)

Breast carcinoma SNP array data from our original ASCAT publication is also available. The data consists of the LogR and BAF values for both the tumour and germline SNP array data. We also include tumour LogR data after adjustment for GC bias using the method described in Diskin et al. Nucleic Acids Research, 36:e126, 2008. Due to privacy regulations, the data are password protected. Please contact us to obtain access.

A script used to analyse these Illumina 109k breast carcinoma SNP array data using ASCAT 1.0 is available here.

Subclonal copy number analysis: the Battenberg algorithm

To assay subclonal copy number changes in massively parallel sequencing data, we created the Battenberg algorithm, based on the underlying ASCAT principles and equations and on haplotype phasing of 1000 genomes SNP loci. The Battenberg algorithm was originally described here, and is now available through GitHub. 

Frequently asked questions

Can I use ASCAT for (germline) CNV analysis?

ASCAT is a tool to detect somatic copy number alterations (CNAs) in cancer samples and cannot be applied to detect germline copy number variants (CNVs). The term CNV refers to a germline variant, polymorphic in a population. To avoid confusion, for somatic copy number changes in tumour samples, we recommend to always use the term CNA.

Which version of ASCAT should I use?

We recommend to always use the latest ASCAT version.

Can ASCAT be applied to cell lines as well?

ASCAT will work well on matched cell line data. However, it will struggle with unmatched cell line data, as the germline genotype prediction tool leverages the signal from admixed normal cells to infer germline genotypes. As most cell lines are in practice unmatched, ASCAT will most likely not be the ideal method for analysis of cell line data.

When should I use ASCAT? When should I use Battenberg?

The Battenberg algorithm is specifically designed for detecting subclonal copy number changes on next-generation sequencing data. The current version of Battenberg can also infer purity and ploidy from the data and would be our method of choice for analysis of whole-genome sequencing data. For analysis of other sequencing data (exome or targeted pulldown), haplotype phasing has more limited added value, and we recommend to use ASCAT. ASCAT also supports analysis of data from other species and analysis of a wide array of SNP arrays.

Peter Van Loo

peter.vanloo@crick.ac.uk
+44 (0)20 379 61719

  • Qualifications and history
  • 2008 PhD in Medical Sciences, University of Leuven, Belgium
  • 2008 Postdoctoral Fellow, Institute for Cancer Research, University of Oslo, Norway
  • 2009 Postdoctoral Fellow, VIB and University of Leuven, Belgium
  • 2010 Postdoctoral Fellow, Cancer Genome Project, Wellcome Trust Sanger Institute, Cambridge, UK
  • 2014 Established lab at the London Research Institute, Cancer Research UK
  • 2015 Winton Group Leader, the Francis Crick Institute, London, UK