Bioinformatics and Protein Evolution

“Nothing in bioinformatics makes sense except in the light of evolution”, by Paul G. Higgs and Teresa K. Attwood, in adaptation of the famous Theodosius Dobzhansky’s remark.

Have you notice that the most fundamental procedures in bioinformatics rely on sequence search and alignment? When amino acid sequences are aligned, scoring systems are used to measure the likelihood of mutational events. These scoring systems are based on evolutionary models that try to estimate the evolutionary distance between the sequences.

Interestingly, proteins are composed of small secondary structure elements (or small domains, in larger proteins), which are continuous sections of the sequence that fold into fairly well-defined 3D structures. Therefore, homology (divergent evolution), duplication and reshuffling of these small elements are a very useful way of evolving new complex proteins. Two interesting facts about proteins enhance the importance of the study of protein evolution:

  • Protein secondary structure tends to be conserved throughout evolution;

  • Distantly related proteins have relative conserved structures;

The main message here is that evolutionary ideas underlie many of the methods used in bioinformatics, such as sequence alignments, identifying families of proteins, and establishing homology between proteins in different organisms. Moreover, in order to create reliable information resources for protein sequences, structures, and domains, we need to have a good understanding of protein evolution.