Genes in the brain are very long and can be transcribed into diverse RNAs.


A big surprise of our iCLIP studies was that transposable elements (TEs) are a major hub for RNP assembly. RNPs assemble on hundreds of thousands of TEs and thereby regulate our RNAs, as explained in the following video.

Over half of our genome is made of TEs, which are the most mobile and variable portion of the genome. We found that they importantly contribute to the evolutionary changes in RNP assembly. This can in turn give rise to the emergence of new tissue-specific exons and RNA regulatory elements. Our studies open the door to understanding how variation in TEs across species, individuals and somatic tissues facilitate evolutionary exploration of new gene functions, or gives rise to diseases.

We study the RBPs that assemble on TEs, and ensure their tight regulation and quality control. We find that sequences derived from TEs are a major driving force for RNP assembly and for emergence of new transcripts during evolution. We also showed that mutations can cause disease by perturbing the binding of RBPs to TEs.

hnRNPC represses splicing of an Alu-exon

hnRNPC represses splicing of an Alu-exon in the PTS gene, which is perturbed in disease. Above, the exon/intron structure of the PTS gene is shown, and the boxes below show the positions of short interspersed elements (SINE), which include Alu elements. In the middle panel, the disease-relevant Alu element within the PTS gene is shown at greater resolution. The position of the two exons that can emerge from this Alu element are schematically indicated: the blank exon is rarely included in the wild-type (WT) cells, whereas the grey exon is highly included in disease. The RNA sequence is shown in a colour-coded fashion. The uridine tracts (in red) bind to hnRNP C, which represses binding of U2AF65. Below, the two 3′ splice sites that can lead to the formation of Alu exon are shown at nucleotide resolution. In wild-type cells, hnRNP C binds to the long uridine tracts to repress the binding of U2AF65, and therefore the 3′ splice site marked by the open arrowhead is rarely used. In disease, the hnRNPC-binding site is deleted, and therefore U2AF65 binds to the pyrimidine tract upstream of the 3′ splice site that is marked by the grey arrowhead. This leads to strong inclusion of the Alu-exon, which perturbs expression of PTS gene to cause disease.

Our first studies focused on Alu elements, which have immigrated as a wave into the genomes of primate ancestors, and are now present in over a million copies in the human genome, representing about 10% of its sequence. We have shown that antisense Alu-derived sequences (those that are transcribed in an antisense orientation as part of longer genes, mainly within introns) are bound by hnRNP C. These antisense Alus are a common source of splice sites, and thus have a potential to turn into exons, but they contain long uridine-rich motifs that are recognized by hnRNP C. This protein represses inclusion of Alu-derived exons (Alu-exons). hnRNP C achieves this by competing with the splicing factor U2AF65 for binding to the uridine tracts. Therefore, loss of hnRNP C leads to inclusion of many thousands of new Alu-derived exons, while mutations that disrupt the uridine tracts in Alus can release Alu-exons from repression by hnRNP C. This leads to inclusion of aberrant Alu-exons, which can cause a variety of human diseases.We have shown that evolution of new exons is driven largely by mutations that gradually shift the balance between binding of positive and repressive RBPs to Alu elements. In the initial stage of Alu-exon evolution, hnRNP C binds very strongly to repress the exons. As a result, the Alu-exons can remain in a harmless cryptic state over long evolutionary periods, during which they accumulate additional mutations. These additional mutations can make the Alu-exons less disruptive, and as a result, the Alu-exons start escaping from the repression, as evident by the shorter uridine tracts and decreased repression by hnRNP C. These older Alu-exons more often contribute to new cellular functions. We hypothesise that the uridine tracts in Alu elements can buy the time needed for mutations to make beneficial changes, rather than disruptive ones, during the evolution of a species. Thus, the uridine tracts act as a ‘rheostat’ for the evolution of Alu-exons, fine-tuning the repressive function of hnRNP C – this ensures that hnRNP C initially prevents the damaging effects of immediate Alu exonization, while then allowing a gradual emergence of Alu-exons that contribute to new functions of the host genes.

Evolution of LINE-exons.

Figure 1: Evolution of LINE-exons. Initially, young LINEs contain highly multivalent binding sites for repressive RBPs, which insulate the LINE and surrounding intronic RNA from RNA processing. Upon mutations that perturb the repressive binding sites, LINEs gradually become a template for evolution of new exons. Therefore, evolutionarily older elements have a higher probability of generating exons, but these LINE-exons are still largely tissue-specific.

Similar to Alu elements, we found that LINE-derived sequences (LINEs) also contribute to splicing regulation by recruiting dozens of RBPs to introns. LINEs are generally longer than Alus, evolutionarily more ancient, and represent an even larger proportion of human genome. We find that on of the primary RBPs bound to LINEs is MATR3, which promotes binding of PTBP1 to multivalent binding sites within LINEs. Both RBPs repress splicing and 3’ end processing within and around LINEs. Notably, repressive RBPs preferentially bind to evolutionarily young LINEs, which are located far from exons. These RBPs insulate the LINEs and the surrounding intronic regions from RNA processing. Upon evolutionary divergence, changes in RNA motifs within LINEs lead to gradual loss of their insulation. Hence, older LINEs are located closer to exons, are a common source of tissue-specific exons, and increasingly bind to RBPs that enhance RNA processing. Thus, LINEs are hubs for the assembly of many kinds of repressive RBPs, and also contribute to the evolution of new, lineage-specific transcripts in mammals. Thus, the assembly of regulatory RNPs on LINEs is even more complex and combinatorial than in the case of Alus, and they have an even stronger impact on the transcriptome across human tissues.

Leading actors

Julian König, Kathi Zarnack, Mojca Tajnik, Jan Attig, Federico Agostini

Selected publications