Predicting protein aggregation and chaperone binding
Protein aggregation is mediated by short aggregation prone polypeptide segments of typically 5-15 amino acids long that are usualy buried inside the folded structure but that, upon exposure by a change of conditions or mutation, assemble into intermolecular beta-sheet like structures, thereby nucleating the aggregation of the entire protein. These so-called aggregation nucleating regions can be detected in the sequence of a protein with reasonable accuracy based on physochemical principles and as a result a myriad of prediction algorithms have been developed over the years to perform this task. The first of these aggregation prediction algorithms was developed by us in collaboration with Luis Serrano (CRG) in 2004 (Nature Biotech, 2004) and is called TANGO. It is publically available online and remains a widely used tool to this day. Like all current protein aggregation prediction algorithms, TANGO calculates the intrinsic aggregation propensity of an input polypeptide sequence and returns short stretches predicted to have a high propensity to nucleate protein aggregation through the formation of intermolecular beta-sheets. These regions constitute the intrinsic aggregation propensity of the sequence in the absence of globular structure. So, although three dimensional relationships that existed in the folded state are no longer relevant during assembly into an intermolecular beta-sheet, they are highly relevant to determine if a particular region is likely to become exposed in the first place. In order to estimate the likelihood that a given short polypeptide segment may become exposed by (partial) protein unfolding, we employ the FoldX force field (NAR, 2005), which calculates the contribution of each amino acid to the thermodynamic stability of the three dimensional structure of the protein, as well as the effect of mutations. The FoldX force field is a joint development of our lab and that of Luis Serrano at CRG. (The FoldX package also includes fragment based protein design tools such as BriX and LoopBrix.)
Like all algorithms that use averaged physicochemical properties to detect aggregation hot spots, TANGO is not specific for amyloid formation or amorphous beta-aggregation. However, amyloid structures are a very specific subclass of aggregates formed by sequences that allow the intermolecular beta-sheet arrangements to pack in a well defined three dimensional structure, resulting in the formation of highly stable amyloid fibrils. The biological properties of these fibrils differ critically from those of amorphous aggregates: amyloid fibrils are highly stable nanowires that are used throughout all kingdoms of life as structural scaffolds, adhesives, water tension modulators etc. Also, protein deposits found in association with a range of human diseases are most often enriched in amyloid structure, which is probably also due to their stability. In order to specifically predict amyloid structure, we developed the Waltz algorithm (Nature Methods, 2010).
We employed our intrinsic aggregation prediction capabilities to analyse the aggregation load of entire proteomes and found that protein structure and protein aggregation are inextricably connected: the requirements of constructing a hydrophobic core of a globular protein partially overlap with those of protein aggregation in such a manner that most globular proteins will contain at least one aggregation prone sequence. However, protein aggregation is kept at bay by protein folding, the stability of the native structure, the turnover of the protein and by the vast molecular machinery of the cell that is dedicated to maintaining proteostasis. We were the first to perform evolutionary analyses of aggregation prone regions and discover the enrichement of aggregation gatekeepers in positions flanking the aggregation hotspots (J Mol Biol, 2006).
These residues oppose aggregation by charge repulsion and/or steric hindrance and moreover, their presence increases the binding of molecular chaperones such Hsp70. In order to understand better what type of sequences are detected by molecular chaperones, we are developing the Limbo suite (Plos Comp Biol, 2009).