Welcome to Bioinformatics Basics at
As the sequencing of the genomes of humans and other species
proceeds, vast amounts of raw data are accumulating in publically
accessible databases. Understanding what is available, how to
access it and the tools available for analysis of these data are now critical
skills for anyone interested in understanding modern bioscience. The
exercises available at this site are designed to give you a very basic
introduction to the databases and some of the methodologies used in
bioinformatic analysis. As of July 2005, the exercises available at
this site are:
Exercise 1: An introduction to searching the scientific
literature. Simple web
searches generally turn up both interesting information and
garbage. As scientists and health professionals you will need to know how to
access the refereed scientific literature (publications which have been reviewed
for accuracy and completeness by other professionals). This exercise will
introduce you to the MEDLINE database accessed via the Entrez Browser and to two other databases available via this browser (Online
Mendelian Inheritance in Man and Books-on-line).
Finding the Nucleotide Sequence for a Gene.
One problem with the
vast amount of data now accessible is that it is becoming increasingly difficult
to sort though it to find a specific gene of interest. This exercise will
introduce you to the Nucleotide database and the various ways to structure a search for a given gene.
Exercise 3: Determining the correct
reading frame for a nucleotide sequence.
Many experiments result in one obtaining a short DNA sequence of
unknown function. Using the available databases it is often possible to assign
this short sequence to a specific gene. Typically the first step in obtaining
such an assignment involves determining the reading frame that the cell uses to
translate this DNA sequence into a protein sequence. This exercise will provide
you with an unknown sequence and you will use web-based tools to determine a
likely reading frame. Save your sequences from this exercise and in the next
exercise you will use them to search the database and determine which gene gave
rise to them.
Exercise 4: Using BLAST to identify a
gene. (cont from Exercise 3) In this exercise,
you will take the best two open reading frames obtained in Exercise 3 and
use them to carry out a similarity search against all the protein sequences
available in the database. We will use the best two open reading frames to allow
us to compare the results obtained in a correct translation with those from an
incorrect translation. This exercise should allow us to assign our unknown
sequence to a specific gene.
Exercise 5: Searching for Sequence
motifs in a given protein. Say you have
found an increase in the level of an mRNA coding for an unknown gene under
conditions of low oxygen pressure. You would like to know what protein this mRNA
is coding for, but a similarity search of the databases reveals no obvious
homologs to this protein. Another way to look for possible function is to
determine if short regions in the protein correspond to sequences which have
been recognized to carry out specific functions. Special search engines have
been designed to look for such "motifs" and you will use one of these to examine
an unknown protein.
Exercise 6: (To be developed) Finding homologs of a
human gene in other organisms.
In recent years we have been able to determine that certain genes are
associated with various human diseases. Often, however, the function of the proteins
coded for by these genes is not clear. One way to better understand the normal
function of these proteins is to find their homologs in simpler organisms which
can be experimentally manipulated. In this exercise you will be given a gene
known to be involved in susceptibilty to a human disease and you will
search the Drosophila or yeast databases to determine if a homologous gene has been found
in these organisms.
These tutorials were developed by Dr. Ross
S. Feldberg, Dept of Biology at Tufts University, Medford, MA 02155
with the assistance of a Teaching with Technology grant from the
Academic Computing Department at Tufts University. Thanks to Anoop Kumar, Abha Verma and Scott Cordeiro for development of this instructional resource.
You are free to make use of this site for educational purposes,
but I would appreciate your informing me of its use and would
welcome any suggestions for improvement, comments on its usefulness as a teaching tool or how it
might be improved. Please cite Bioinformatic Basics at Tufts as
follows: Feldberg, R.S. 2005 Bioinformatic Basics at Tufts. World
Wide Web electronic publication http://ase.tufts.edu/biology/bioinformatics
Corrections, comments or suggestions are greatly appreciated and
should be sent to firstname.lastname@example.org (last modified Aug 2005)