The evolutionary history of cancer



  • Reconstruction of the evolutionary history of 2,658 tumours, comprising 38 cancer types

  • Early oncogenesis is characterised by mutations in a constrained set of driver genes, specific copy number gains and chromothripsis and chromoplexy

  • Punctuated evolution is a frequent event

  • Key driver mutations and whole genome duplication may precede diagnosis by many years, highlighting opportunities for early cancer detection.

In 2020, the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium published the most comprehensive and ambitious meta-analysis of cancer genomes thus far attempted. The group, a worldwide interdisciplinary federation of scientists from four continents with 744 affiliations between them, generated and analysed whole-genome sequences from 2,658 tumours across 38 cancer types, alongside matched samples of non-cancerous cells from the same patients.

As part of this consortium, the Crick’s Peter Van Loo and his colleagues Paul Spellman (Oregon Health Science University) and David Wedge (Oxford) jointly supervised a subgroup whose goal was to reconstruct the evolutionary history of individual tumours, in order to determine the boundaries between normal evolution of somatic tissue and cancer progression (Gerstung et al, 2020). Their analysis allowed them to delineate the typical evolutionary trajectories of cancer and map them in real time relative to the point of diagnosis.

The genome of a cancer cell is shaped by the cumulative somatic aberrations that have arisen during its evolutionary past, and the relative timing of these events can be reconstructed in part from whole-genome sequencing data. Initially, each point mutation occurs on a single chromosome in a single cell, giving rise to a lineage of cells carrying the same mutation. If that locus is subsequently duplicated, any point mutation preceding the duplication will be present on the two resulting allelic copies, unlike any mutation occurring after the gain, which will only be present on one copy. Therefore, it is possible to define categories of early and late gains, as well as unspecified clonal variants which are common to all cancer cells but cannot be timed further. Cells carrying all these mutations are defined as arising from the most recent common ancestor (MRCA) of all cancer cells in the tumour sample. Subclonal variants in the tumour arise later, in mutated descendants of the MRCA.

Timing clonal copy number gains using allele frequencies of point mutations

Timing clonal copy number gains using allele frequencies of point mutations

The ratio of duplicated to non-duplicated mutations within a gained region can be used to estimate when the gain happened during clonal evolution, giving a measurement of molecular time. For example, if there are many more two-copy mutations, it follows that the gain must have happened late. Multiple mutational gains at the same molecular time point to larger events, such as an increase in copy number of all or part of a chromosome, or most drastically, whole genome duplication.

By aggregating these evolutionary data, the authors were able to look at the timing of gains in individual chromosomes across the 38 cancer types. Most tumour types had variable timing, indicating relatively broad periods of chromosomal instability, but in lung cancers, melanomas and papillary kidney cancers, gains were predominantly late. Timing of gains typically had similar distributions across all chromosomes, but in two brain cancers, particular chromosomes were gained exceptionally early, perhaps even in childhood: chromosome 17q for medulloblastomas, and for 90% of glioblastomas, chromosomes 7, 19 and 20. 

During this initial analysis, the authors noticed that gains in the same tumour often appeared to occur at a similar molecular time, pointing towards punctuated bursts of copy number gains. This might be expected where whole genome duplication had occurred, but in those tumours in which it had not, there were still a substantial number where punctuated copy number gains were observed, most likely due to mis-segregation during a single aberrant mitosis.

Three processes able to generate punctuated events were studied (The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium, 2020): chromoplexy, in which repair of co-occurring double strand DNA breaks, typically on different chromosomes, results in shuffled chains of rearrangements; kataegis, a focal hypermutation process that leads to locally clustered nucleotide substitutions biased towards a single DNA strand; and chromothripsis, in which one or sometimes multiple chromosomes are shattered into tens or hundreds of pieces, which are then randomly rejoined. 

Kataegis was detected in 61% of cancers, but only occasionally led to driver mutations, notably in the MYC gene in B-cell Non-Hodgkin Lymphoma. Chromoplexy seemed to play a more important role, identified in 18% of samples, most prominently in prostate adenocarcinoma, already known to harbour chromoplexic oncogenic translocations, but also, unexpectedly, in thyroid adenocarcinoma. There were recurrent rearrangements of different genomic loci across these tumour types, driven by positive selection for particular fusion genes or enhancer-hijacking events. In thyroid adenocarcinoma, for example, BRAF was recurrently targeted.

The Crick’s reputation as the UK flagship institute, even very early on in its inception, has allowed us to recruit extremely talented students and postdocs. The Crick PhD programme is particularly worth mentioning here: it attracts
many stellar candidates, is run very efficiently and provides excellent support for both students and group leaders.

Peter Van Loo

Chromothripsis was identified in 22% of samples, most commonly in sarcomas, glioblastomas, melanomas, breast cancers and squamous cell lung cancers, and caused driver mutations in some 10% of tumours through homozygous deletion of tumour suppressors, amplification of known oncogenes, or enhancer hijacking events. The most common amplifications were of CCND1, MDM2 and ERBB2, which each occurred in more than 1% of the tumours in the series.

Looking at the timing of these events, chromothripsis and chromoplexy were consistently early events, which taken together with their propensity to cause driver mutations, suggest they are key early events in tumour evolution. In contrast, kataegis is usually a late subclonal process.

The ability to assign timings for the occurrence of individual driver mutations in a tumour meant that it was possible to determine the sequence in which they arose. Mutations in TP53 were highly enriched at early times across all cancers, and in total, 50% of all early clonal driver mutations occurred in only nine genes, confirming previous studies suggesting that very early events in cancer evolution occur in a constrained set of common drivers. In addition to confirming and extending the well-established sequence of mutational events that lead to colorectal cancer, detailed sequences of somatic driver events were derived for many cancers for which this was previously unknown, including ovarian adenocarcinoma, pancreatic neuroendocrine tumours and glioblastoma.

Different mutational processes leave different signatures in the cancer genome, and these signatures can be used to calculate the relative contributions of different mutagens to a tumour. The authors were able to add temporal data to this equation, separating mutations by the time periods in which they arose. Signatures left by exogenous carcinogens, such as UV light in melanoma and tobacco smoking in lung cancer, tend to occur early, to be superseded by rising levels of endogenously driven signatures such as APOBEC mutagenesis. Intriguingly, the method offers a way of detecting when orphan signatures arise: for example, a rare mutational signature of unknown origin found in a handful of melanomas arises very late in tumour evolution.

Evolutionary profile of pancreatic adenocarcinoma

Evolutionary profile of pancreatic adenocarcinoma

Mutational signatures can also be used to synchronise molecular time with real time: one such signature, the spontaneous deamination of methylated cytosine at CpG dinucleotides, occurs in normal as well as cancerous tissue, and has previously been shown to be proportional to a patient’s age at diagnosis. This mutation clock ticks at considerably different rates for different tissues or cell types, but once this is adjusted for, together with any differences between normal and tumour cells, it is possible to time some key events in tumour evolution, all of which have significant implications for early diagnosis. 

Identifying when the most recent common ancestor (MRCA) first appears defines the commitment point of a tumour. For most cancers, the MRCA could be timed to arise some three to six months before diagnosis, but there was a spectrum of latency, culminating in several cancer types where the MRCA was present between two and twelve years before diagnosis.

It was also possible to define when whole genome duplication had taken place. The authors found that approximately 30% of cases had undergone whole genome duplications, and these had typically occurred well before the emergence of the MRCA, between five and ten years before diagnosis. For ovarian cancers, this timescale extended to a median of 14 years, with some cases showing lag times of multiple decades. 

From all these data, the authors have generated an evolutionary profile of each cancer type: the identity and time of appearance of the driver mutations, copy number changes, and mutational signatures that they contain, together with the real time timings of whole genome duplication and the emergence of the most recent common ancestor. Their results show that some tumours, including some particularly intractable types, can take many years, sometimes decades, to develop, providing clear opportunities for future early diagnosis approaches.