Multiple sequence alignments (MSAs) are frequently used in the
study of families of protein sequences or DNA/RNA sequences. They
are a fundamental tool for the understanding of structure,
functionality and ultimately evolution of proteins.
A new algorithm, the Circular Sum (CS) method, is presented for
formally evaluating the quality of an MSA. It is based on the use
of a solution to the Traveling Salesman Problem, which identifies
a circular tour through an evolutionary tree connecting the
sequences in a protein family. With this approach the calculation
of an evolutionary tree and the errors that it would introduce can
be avoided altogether. The algorithm gives an upper bound, the
best score that can possibly be achieved by any MSA for a given
set of protein sequences. Alternatively, if presented with a
specific MSA, the algorithm provides a formal score for the MSA,
which serves as an absolute measure of the quality of the MSA. The
CS measure yields a direct connection between an MSA and the
associated evolutionary tree, which allows us to use the measure
to evaluate evolutionary trees and as a tool for evaluating
different methods for producing MSAs. A brief example of the last
application is provided. Because it weights all evolutionary
events on a tree identically, but does not require the
reconstruction of a tree, the CS algorithm has advantages over the
frequently used sum-of-pairs measures for scoring MSAs, which
weight some evolutionary events more strongly than others.
Compared to other weighted sum-of-pairs measures it has the
advantage that no evolutionary tree must be constructed, because
we can find a circular tour without knowing the tree.
Keywords: multiple sequence alignment, phylogenetic tree, scoring
function, TSP, evolution