Scores and E values

The sequences provided in these exercises are of known genes so they should give a perfect match to at least one entry in the database. If you extend the number of alignments displayed to 50 or 100, you will begin to find sequences which share only short stretches of similarity. Also, when you examine a new sequence you may only get matches over limited regions. Are these similarities meaningful? Answering that question often takes considerable experience using these tools.

However, two statistical measures - the Score and the E-value - have been developed to give you some idea of the significance of similarity between two sequences.

The alignment Score (S) is calculated by assigning a value to each position matching an entry in the database and then subtracting some value for each mismatch and each gap introduced to maximize. (for more information on scoring). The higher the score the better the match. However, if you use a short sequence as your query sequence you can get high alignment scores by chance. This brings us to...

The E (Expectation)-value represents the number of alignments with the given score value that could have arisen by chance alone.The lower the E-value, the more significant the score. A very low number e.g. 10(-100) indicates that essentially there is no chance that the given alignment is a chance event while a high E-value  e.g. 10(-10) suggests that the given alignment could be a chance event. However, if you are searching for distant evolutionary relationships you may need to make use of matches with relatively low E values.

Caveat
E-values can be misleading and you need to bring an informed mind to any sequence analysis. Thus a protein which shows a high degree of similarity over a short region may give a very low E-value for this region even though the two proteins are not closely related.

Explore Further
If you want to explore this in more detail, try comparing your best amino acid sequence of 20 amino acids and then just select eight of those amino acids and do another BLASTP search. Look at how the E value changes for the same S value!



Back to Bioinformatics Main Page