Structural basis for retroviral DNA integration



  • Following capture by a retroviral intasome, the nucleosome flexes to permit integration

  • The first structure elucidating the integration machinery of a lentivirus

  • Structural insights into the binding of second-generation HIV integrase strand transfer inhibitors demonstrate the basis of viral drug resistance

Approximately 40 million people, many living in poor and developing countries, are infected with HIV, the retrovirus which is the causative agent of AIDS. A hallmark of retroviral replication is establishment of the proviral state, wherein a DNA copy of the viral genome is stably integrated into a host cell chromosome. Integration is the fundamental reason for the ability of retroviruses to establish life-long persistent infections.

Integrase (IN) is the viral enzyme responsible for the catalytic steps involved in this process. Inhibitors of HIV IN strand transfer activity are used as part of standard-of-care combination antiretroviral therapy, and therefore, understanding their mode of action and the mechanisms of viral resistance to these drugs is of considerable importance for global health. On the flip side, integration makes retroviruses ideal vehicles for therapeutic gene delivery. Pharmaceutical companies have invested heavily into development of HIV-based vectors for gene therapy applications, but an unsolved problem of this technology is uncontrolled integration into sensitive loci of the human genome. Understanding the organisation of the retroviral pre-integration complex and its interactions with chromatin and host factors will inform the development of safer vectors for gene therapy applications.

The target of retroviral integration in vivo is chromosomal DNA, the bulk of which exists in the form of nucleosomes. The nucleosomal structure imposes constraints on the conformation and accessibility of DNA. Moreover, wrapping around a histone octamer results in a relatively smooth bend in nucleosomal DNA, in contrast with the sharply distorted conformation of target DNA in complex with retroviral intasomes. Despite this, numerous studies have established that chromatinised DNA is a preferred target for retroviral integration.

In the previous quinquennium, Peter Cherepanov’s group reported the first structure of the retroviral intasome, and visualised the key intermediates of the integration process using X-ray crystallography. In more recent work, to understand how the integration machinery engages chromatin, Cherepanov and colleagues have reconstituted the complex of the intasome from the prototype foamy virus (PFV) and a mono-nucleosome (Maskell et al, 2015).

Due to pseudosymmetry of the nucleosomal structure, the target DNA in the complex had to be carefully optimised to afford high-resolution structure determination. A DNA sequence able to bind the intasome under stringent conditions was therefore selected from a library of human nucleosomes captured by the intasome. The resulting cryo-EM structure, determined in collaboration with Alessandro Costa, revealed that the nucleosome adapts to present the DNA in a conformation optimal for strand transfer, forming a small loop at the site of integration. In the complex, the intasome forms extensive interactions with both gyres of nucleosomal DNA and with the core histones. These interactions are thought to offset the energy required to shift the nucleosomal DNA along the histone octamer core to create the slack for loop formation. Consequently, the PFV intasome displays highly preferred nucleosomal integration sites at the superhelix ±3.5 locations that overlay the H2A-H2B heterodimers. This structure was the first to illustrate nucleosome flexibility facilitating a biological process, and therefore has important implications for the wider field of chromosome biology.


While PFV presents an ideal model to study the fundamental aspects of retroviral integration, it is highly divergent from HIV-1 and other lentiviruses. To close this gap, Cherepanov and colleagues have developed novel models suitable for studies of lentiviral integration (Ballandras-Colas et al, 2017) and for informing anti-HIV/AIDS drug development (Cook et al, 2020).

HIV-1 IN is notoriously prone to aggregation as a recombinant protein in isolation. To overcome these unfavourable biochemical properties and enable its study in vitro, hyperactive and/or solubilising mutations had been created, but these dramatically change the properties of the protein. Taking a more holistic approach, Cherepanov and colleagues sought to identify an alternative lentiviral IN that would be amenable for structural studies in its wild type form. They were able to reconstitute and extensively characterise the intasome from maedi-visna virus (MVV), an ovine lentivirus.

Cryo-EM structure of the PFV intasome-nucleosome complex

Cryo-EM structure of the PFV intasome-nucleosome complex


The resulting cryo-EM reconstruction allowed the building of a complete atomic model of the 0.5-MDa MVV intasome. It comprises a homo-hexadecamer of IN with a tetramer-of-tetramers architecture featuring eight structurally distinct types of IN protomers supporting two catalytically competent subunits. The conserved intasomal core is structurally identical to the simpler PFV intasome, formed between two IN tetramers, with a pair of C-terminal domains from flanking tetramers completing the synaptic interface. The architecture strongly suggests that the lentiviral intasome forms an expanded interface with chromatin, which may help explain the exquisite selectivity of HIV-1 for integration into highly active transcription units. This structure was unexpected, but reconciles the bulk of early HIV-1 IN biochemical and structural data, and provides a useful platform for future design of HIV-1 IN inhibitors.

IN strand transfer inhibitors (INSTIs) represent some of the most effective antimicrobial inhibitors, but despite their remarkable clinical success, the first in-class INSTI, raltegravir, suffered setbacks due to the emergence of drug resistance. The second-generation compounds dolutegravir and bictegravir proved to be better at combatting the emergence of resistance, and are now in use around the world to treat both drug naïve and drug-experienced patients who have failed other antiretroviral formulations. However, it has been impossible to explain why these second-generation compounds display improved resistance barriers.

It is a huge privilege to work alongside some of the brightest people in Biomedical Science.

Peter Cherepanov


Not even the most advanced INSTIs are completely immune to viral resistance: among mutations associated with virologic failure, the double amino acid substitution Q148H/G140S in HIV-1 IN causes complete or partial loss of efficacy across the entire INSTI drug class. To address why this should be, and to develop a robust system for drug-related studies, Cherepanov and colleagues screened some 20 primate lentiviral INs and identified the red-capped mangabey simian immunodeficiency virus (SIVrcm) IN as an ideal model for further structural studies. Because the HIV-1 pol gene is originally derived from SIVrcm, the viruses share high IN amino acid sequence identity. Critically, the new model has made it possible to visualise by cryo-EM what is for all intents and purposes the HIV-1 IN active site.

The group refined structures of the SIVrcm intasome in its inhibitor-free state, as well as when bound to dolutegravir and bictegravir. The structure of the SIVrcm intasome in complex with bictegravir was determined to a local resolution of 2.4 Å throughout the active site region, resolving amino acid side chain rotamers and water molecules. The extended scaffolds of the second-generation INSTIs allow these molecules to span the full width of the IN active site, making essential stabilising interactions with IN backbone. The interactions with magnesium ions, which are nearly covalent in nature, are partly responsible for the extraordinarily tight binding of the strand transfer inhibitors.

Case study figure 2


The structures also revealed that the chink in the armour of this drug class, exploited by the virus, is the extreme sensitivity of metal ions for the precise geometry and electronic properties of the ligand chelating cluster. Mutations such as Q148H/G140S work by destabilising the coordination spheres of the Mg2+ ion pair, indirectly affecting drug binding. The structural insights reported in this work provide a structural framework for refining and improving this highly effective class of anti-HIV/AIDS therapeutics.

Bictegravir in the active site of the SIV rcm intasome

Bictegravir in the active site of the SIV rcm intasome