An introduction to the searching the scientific literature
Finding the Nucleotide Sequence for a Gene
Determining the correct reading frame for an unknown nucleotide sequence
Using BLAST to identify a gene (cont from Exercise 2)
 Searching for Sequence motifs in a given protein
Finding homologs of a human gene in other organisms

Exercise 5: Finding Domains in Protein Sequences

I. INTRODUCTION
Many proteins which have been classified as "globular (i.e. folded into a compact globular shape) appear to be composed of several distinct folded regions joined by more extended loops of amino acids. These globular subregions are termed "domains" and can range in size from 20-300 amino acids. Some domains have been associated with specific functions (e.g. catalysis of peptide bond cleavage, ATP binding, etc), but this association must be tentative since ligand binding or formation of an active site often takes place at the surface where two domains interact. Identification of domains can help us to assign a newly discovered open reading frame to a family of proteins. Domains in a newly discovered protein can be recognized by sequence homology with known domains in well characterized proteins, but this is still not a precise science. While new techniques of analysis are being introduced, at the present the most user-friendly and visual domain identification program is the SMART domain annotation database.

Paste the full sequence of the protein identified in exercise 4 below.

You can access the SMART Protein Domain database via the server indicated below.

Copy your sequence and past it into SMART sequence window Click the Sequence SMART button. Depending on how busy the SMART server is, it may take a few minutes for a result to be returned. BE PATIENT!!

The results will show you a live diagram with the domains within the query sequence. Each domain has a unique color and shape and annotation.

EMBL SMART

Scroll down the window to see a table that lists each identified domain together with its putative (probable) start and end point in your sequence and the probability (E-value) assigned to that identification (the smaller the e-value the more likely the identification is not simply due to chance).

Clicking the mouse over the domain on the figure or in the table will bring up the domain name or abbreviation and the amino acid sequence assigned to this domain at the very bottom of the Netscape window. With a PC, right click on the image to save it as a PNG file. with Macintosh, hold down the control key and the mouse button to save the figure. Rename it with a descriptive title and the .png extension. It can be opened in Quicktime or Photoshop or most any other reader.
Clicking on the domain name will bring up more detailed information on the domain.

Pick out one domain to examine in detail.
What are the characteristics (amino acid sequences) that define that domain?
What kinds of proteins contain this domain?
What is the function of that domain?
How similar is your sequence to the defined domain?

These tutorials were developed by Dr. Ross S. Feldberg, Dept of Biology, Tufts University, Medford, MA 021554 with the assistance of a Teaching with Technology grant from the Academic Computing Department at Tufts. Thanks to Anoop Kumar, Abhra Verma and Scott Cordeiro for help in developing this resource. Suggestions, corrections and comments should be sent to Ross.Feldberg@Tufts.edu. (last modified Aug 2005)