Allele-Specific Copy number Analysis of Tumours (ASCAT)
ASCAT is a method to derive copy number profiles of tumour cells, accounting for normal cell admixture and tumour aneuploidy. ASCAT infers tumour purity (the fraction of tumour cells) and ploidy (the amount of DNA per tumour cell, expressed as multiples of haploid genomes) from SNP array or massively parallel sequencing data, and calculates whole-genome allele-specific copy number profiles (the number of copies of both parental alleles for all SNP loci across the genome).
The latest ASCAT version is available as an R package.
In its simplest form (with matched normal data available, without GC wave correction and all samples female), ASCAT can be run as follows:
The ascat.loadData function by default assumes all samples are female. An extra optional parameter (gender = …) allows setting the gender of samples (in vector format, using "XX" for females and "XY" for males).
Running ASCAT without matched normal data
ASCAT can also run without matched normal data, and can infer the necessary germline genotypes from the tumour data:
The germline genotype prediction function needs the used SNP array platform as input. Currently supported platforms are: "Affy10k", "Affy100k", "Affy250k_sty", "Affy250k_nsp", "Affy500k", "AffySNP6", "AffyOncoScan", "AffyCytoScanHD", "Illumina109k", "IlluminaCytoSNP", "Illumina610k", "Illumina660k", "Illumina700k", "Illumina1M" and "Illumina2.5M", "IlluminaOmni5". For the example data above, a "Custom10k" platform is also included.
If you would like to run ASCAT without matched normal data on another platform and are prepared to send over some sample data, please contact us.
GC wave correction
Samples profiled through SNP arrays or massively parallel sequencing are often affected by 'wave artefacts' that are in part correlated with GC content of the surrounding region (see e.g. this paper by Diskin et al.). We've implemented GC wave correction in ASCAT, and recommend adding that step to the pipeline if the input data hasn't been through alternative methods for GC correction.
Our GC correction method is based on the one initially implemented in the following paper: Jiqiu Cheng, Evelyne Vanneste, Peter Konings, Thierry Voet, Joris R. Vermeesch, and Yves Moreau. Genome Biology 12:R80, 2011.
This requires a GC content file for the platform used. We currently provide GC content files for:
If you would like to run ASCAT GC correction on another platform, please contact us.
Input data formats and supported platforms
ASCAT is platform and species independent, and works for both Illumina and Affymetrix SNP arrays, as well as for massively parallel sequencing data. The input required includes matrices of LogR and B Allele Frequency (BAF) data (rows are probes or SNP loci and columns are samples). ASCAT requires identically formatted LogR and BAF files for both tumour and germline data (with matching samples on matching rows in all four files). For examples of the precise data format, see our simulated example data.
Input data for ASCAT can be obtained directly from Illumina GenomeStudio, or can be derived from Affymetrics CEL files, e.g. through the PennCNV libraries. The pipeline we use (and recommend) for Affymetrix SNP 6.0 arrays can be found within the R package on GitHub.
ASCAT input files can also be derived from massively parallel sequencing data, through log-transformed normalised read depth (LogR) and allele frequencies (BAF) of selected SNP loci. A complete whole-genome pipeline that uses SNP loci from the Affymetrix SNP 6.0 arrays can be found here. We are working on an integrated and automated whole genome, exome and targeted sequencing version that runs directly from BAM files.
ASCAT can also be run on data from other species, for example SNP arrays from canine breast cancers or exomes from zebrafish melanomas. As the method leverages SNP loci, it will however not work on haploid or homozygous (inbred) species (e.g. inbred mouse strains).
An important platform- and normalisation-specific parameter is the normalisation parameter (gamma) within the function ascat.runAscat. The parameter represents the drop in LogR for a change from two copies to one copy in 100% of cells. Gamma theoretically should equal 1, but due to array background signal and bespoke array normalisation procedures, in practice it is often significantly lower. Its default setting of 0.55 works for many but not all SNP arrays (e.g. Illumina 109k arrays as processed through BeadStudio/GenomeStudio and Affymetrix SNP 6.0 arrays processed through the PennCNV libraries). For other SNP array platforms (and normalisation procedures), we recommend checking the value of gamma through comparison of a male and female germline sample (evaluating the difference in LogR values of the X chromosome probes between genders, relative to the rest of the genome), or through an X chromosome titration series. For massively parallel sequencing data, gamma should always be set to 1.
The output of ASCAT, and how to interpret it, is described in this book chapter.
Legacy versions and data
Historic versions of ASCAT are available as part of our GitHub version. We recommend to always use the latest version, but we provide the historic versions for legacy reasons.
Major changes to ASCAT over the original version 1.0 are:
- Availability as an easy-to-use and coherent R software suite (2.0)
- Major improvements in computational speed (2.0)
- Platform-independence (2.0)
- Update of the core algorithm for better performance and results (2.0 and 2.2)
- Addition of germline genotype prediction and thereby extension to unmatched tumour samples (2.0)
- Adaptations to the ASPCF segmentation algorithm to increase sensitivity in samples with low noise and to increase robustness in more noisy samples (2.1)
- Addition of a gender parameter, allowing correct handling of copy number aberrations on the X chromosome in male samples (2.2)
- Addition of GC correction code (2.2)
- Adaptations to allow manual refitting of samples (2.3)
- Adaptations and additions to output data structures (2.3)
- Availability as R package (2.4)
Breast carcinoma SNP array data from our original ASCAT publication is also available. The data consists of the LogR and BAF values for both the tumour and germline SNP array data. We also include tumour LogR data after adjustment for GC bias using the method described in Diskin et al. Nucleic Acids Research, 36:e126, 2008. Due to privacy regulations, the data are password protected. Please contact us to obtain access.
A script used to analyse these Illumina 109k breast carcinoma SNP array data using ASCAT 1.0 is available on GitHub.
Subclonal copy number analysis: the Battenberg algorithm
To assay subclonal copy number changes in massively parallel sequencing data, we created the Battenberg algorithm, based on the underlying ASCAT principles and equations and on haplotype phasing of 1000 genomes SNP loci. The Battenberg algorithm was originally described here, and is now available through GitHub.