A big surprise of our iCLIP studies was that transposable elements (TEs) are a major hub for RNP assembly. RNPs assemble on hundreds of thousands of TEs.
Since TEs evolve very quickly, they drive the evolutionary changes in RNP assembly, and thereby give rise to the emergence of new tissue-specific exons and RNA regulatory elements.
This opens the door to understanding how variation in TEs across species, individuals and somatic tissues facilitate evolutionary exploration of new gene functions, or gives rise to diseases.
We collaborate with the Luscombe group to study the RBPs that assemble on TEs, and ensure their tight regulation and quality control. We find that sequences derived from TEs are a major driving force for RNP assembly and for emergence of new transcripts during evolution. We also showed that mutations can cause disease by perturbing the binding of RBPs to TEs.
Our first studies focused on Alu elements, which have immigrated as a wave into the genomes of primate ancestors, and are now present in over a million copies in the human genome, representing about 10% of its sequence. We have shown that antisense Alu-derived sequences (those that are transcribed in an antisense orientation as part of longer genes, mainly within introns) are bound by hnRNP C. These antisense Alus are a common source of splice sites, and thus have a potential to turn into exons, but they contain long uridine-rich motifs that are recognized by hnRNP C. This protein represses inclusion of Alu-derived exons (Alu-exons). hnRNP C achieves this by competing with the splicing factor U2AF65 for binding to the uridine tracts. Therefore, loss of hnRNP C leads to inclusion of many thousands of new Alu-derived exons, while mutations that disrupt the uridine tracts in Alus can release Alu-exons from repression by hnRNP C. This leads to inclusion of aberrant Alu-exons, which can cause a variety of human diseases.
We have shown that evolution of new exons is driven largely by mutations that gradually shift the balance between binding of positive and repressive RBPs to Alu elements. In the initial stage of Alu-exon evolution, hnRNP C binds very strongly to repress the exons. As a result, the Alu-exons can remain in a harmless cryptic state over long evolutionary periods, during which they accumulate additional mutations. These additional mutations can make the Alu-exons less disruptive, and as a result, the Alu-exons start escaping from the repression, as evident by the shorter uridine tracts and decreased repression by hnRNP C. These older Alu-exons more often contribute to new cellular functions. We hypothesise that the uridine tracts in Alu elements can buy the time needed for mutations to make beneficial changes, rather than disruptive ones, during the evolution of a species. Thus, the uridine tracts act as a ‘rheostat’ for the evolution of Alu-exons, fine-tuning the repressive function of hnRNP C – this ensures that hnRNP C initially prevents the damaging effects of immediate Alu exonization, while then allowing a gradual emergence of Alu-exons that contribute to new functions of the host genes.
Similar to Alu elements, we found that LINE-derived sequences (LINEs) also contribute to splicing regulation by recruiting dozens of RBPs to introns. LINEs are generally longer than Alus, evolutionarily more ancient, and represent an even larger proportion of human genome. We find that on of the primary RBPs bound to LINEs is MATR3, which promotes binding of PTBP1 to multivalent binding sites within LINEs. Both RBPs repress splicing and 3’ end processing within and around LINEs. Notably, repressive RBPs preferentially bind to evolutionarily young LINEs, which are located far from exons. These RBPs insulate the LINEs and the surrounding intronic regions from RNA processing. Upon evolutionary divergence, changes in RNA motifs within LINEs lead to gradual loss of their insulation. Hence, older LINEs are located closer to exons, are a common source of tissue-specific exons, and increasingly bind to RBPs that enhance RNA processing. Thus, LINEs are hubs for the assembly of many kinds of repressive RBPs, and also contribute to the evolution of new, lineage-specific transcripts in mammals. Thus, the assembly of regulatory RNPs on LINEs is even more complex and combinatorial than in the case of Alus, and they have an even stronger impact on the transcriptome across human tissues.