Large database for the analysis and prediction of spliced and non-spliced peptide generation by proteasomes


Proteasomes are the main producers of antigenic peptides presented to CD8+ T cells. They can cut proteins and release their fragments or recombine non-contiguous fragments thereby generating novel sequences, i.e. spliced peptides. Understanding which are the driving forces and the sequence preferences of both reactions can streamline target discovery in immunotherapies against cancer, infection and autoimmunity. Here, we present a large database of spliced and non-spliced peptides generated by proteasomes in vitro, which is available as simple CSV file and as a MySQL database. To generate the database, we performed in vitro digestions of 55 unique synthetic polypeptide substrates with different proteasome isoforms and experimental conditions. We measured the samples using three mass spectrometers, filtered and validated putative peptides, identified 22,333 peptide product sequences (15,028 spliced and 7,305 non-spliced product sequences). Our database and datasets have been deposited to the Mendeley (doi:10.17632/nr7cs764rc.1) and PRIDE (PXD016782) repositories. We anticipate that this unique database can be a valuable source for predictors of proteasome-catalyzed peptide hydrolysis and splicing, with various future translational applications.

Journal details

Journal Scientific Data
Volume 7
Issue number 1
Pages 146
Available online
Publication date


Crick authors

Crick First author
Crick Corresponding author