An introduction to the searching the scientific literature
Finding the Nucleotide Sequence for a Gene
Determining the correct reading frame for an unknown nucleotide sequence
Using BLAST to identify a gene (cont from Exercise 2)
 Searching for Sequence motifs in a given protein
Finding homologs of a human gene in other organisms

Exercise 1: An introduction to searching the scientific literature.

The most fundamental skill in bioinformatics is the ability to carry out an efficient and comprehensive search of the scientific literature to find out what is known about a specific subject. All of you are familiar with web search engines and while they can be useful, they also turn up many items that have never undergone the test of scientific peer review. Thus, this exercise is NOT a search of the world wide web, but will introduce you to searching the published scientific literature using a database such as MEDLINE, Biological Abstracts or Chemical Abstracts. This exercise will focus on the Entrez browser entry to the national library of medicine database MEDLINE (PubMed). Other useful resources include OMIM (Online Mendelian Inheritance in Man) and Books-on-Line, both available via Entrez

PubMed via Entrez Browser
We will access Medline directly from the Library of Congress via the Entrez browser ( It is also possible to access Medline or Biological Abstracts or a number of other databases through the Tufts Library system via OVID browser (Tufts University Libraries > Article Databases> O>OVID). Each browser has its own advantages and disadvantages and I have summarized these on a separate document We will focus on Entrez browser since it is available to all users. Some key features of the Entrez browser include:

  1. Free and accessible to everyone (i.e. you don't need a Tufts ID or access via an institutional library to use this)
  2. Clipboard feature allows for easy generation of subsets of hits
  3. Related articles can help broaden a search to capture articles that would not appear in the initial term
  4. Links to full text of a number of textbooks gives background information
  5. Allows easy connection to nucleotide and protein databases via Entrez browser

This Linked Document is an example of a PubMed search that uses Boolean Operators, the Clipboard, Limits, Related Articles, Books and Full Text on-line. Additional on-line help files and tutorials are available from the national library of medicine.

PubMed Tutorial and PubMed Help files

III. EXERCISE 1 (The Literature Search)
Pick one of the four exercises below. Once you have a short list of articles, pick one and display it in abstract display. Click on Books and you will see a number of terms in the abstract are now links. Click on one of these to see what information is available.

Exercise 1 (Uncoupling Protein and Diabetes)
UCP2 is a protein found in the inner mitochondrial membrane of some cells. It leaks protons back across the membrane, decreasing the amount of ATP produced. Recently, a role for UCP2 in the development of adult onset diabetes has been proposed. You already know that there is a correlation between obesity and adult onset diabetes. Construct a literature search for papers published on UCP2 which also deal with obesity and are limited to research done with humans.

Exercise 2 (Telomeres and Aging)
The ends of chromosomes (telomeres) pose specific problems for the replication of DNA. Recent work has suggested that a process as fundamental as aging may be connected to telomere dysfunction. Construct a search of the literature for those review articles that deal with telomeres and aging. Note five review articles on this topic. Since it is unlikely you know much about telomeres, do a quick search on telomeres in the Books database and look at one entry. List the entry you looked at.

Exercise 3 (Glycolysis in the neonatal heart)
You have been studying glycolysis in biochemistry class and are interested in learning what research has been done on this area that specifically looks at glycolysis in newborn humans and is published in an English language journal. You would like to focus your search on this pathway in the heart since you know that each organ is slightly different in its metabolic needs and capacities. Structure a search to pull this information out. Provide me with the total number of hits your search brings up and a list of the titles of the five articles you deem most interesting (based on a quick look through the abstracts).

Exercise 4 (matrix metalloproteinases and periodontal disease)
Matrix metalloproteinases are a family of extracellular proteolytic enzymes that have been implicated in a variety of pathological conditions including metastatic cancer. Recently, it has been suggested that these enzymes could also play a role in gum disease. Find three articles that address the role of this class of enzymes in periodontal disease.

These tutorials were developed by Dr. Ross S. Feldberg, Dept of Biology, Tufts University, Medford, MA 02155 with the assistance of a Teaching with Technology grant from the Academic Computing Department at Tufts. Thanks to Anoop Kumar, Abhra Verma and Scott Cordeiro for help in developing this resource. Suggestions, corrections and comments should be sent to (last modified Aug 2005)