Resources for Sequence Similarity Searching

Scientists frequently perform sequence-similarity searching to see if a gene or protein from one organism has a similar counterpart in another organism. For example, to determine the function and biological importance of a new human protein, scientists often identify a similar mouse protein and then use that protein as a model for studying the human protein.

As we know from molecular biology's "central dogma," the order of nucleotides in a gene's DNA sequence determine the order of amino acids in a protein sequence. Each set of three nucleotides (called a codon) in the DNA sequence encodes a particular protein. See the Table of Standard Genetic Code to see which codons are associated with which amino acids.

Since more than one codon can encode the same amino acid, there is a considerable amount of variability in the nucleotide sequence that could translate into the same amino acid sequence. The genetic code's degenerate nature is the reason that similarity searching using amino acid sequences generally is more informative than using nucleotide sequences.

Users who are new to sequence-similarity searching should check out NCBI's Introduction to Similarity Page , Homology - General Rules, and BLAST Guide's Glossary.

NCBI BLAST

Indication: BLAST (Basic Local Alignment Search Tool) is a set of programs designed to perform similarity searches on all available sequence data. BLAST uses an algorithm developed by the National Center for Biotechnology Information (NCBI) that seeks out local alignment (alignment of some portion of two sequences) as opposed to global alignment (alignment of two sequences over their entire length). By searching for local alignments, BLAST can identify regions of similarity in two sequences. Some similarity searches offered by NCBI include comparing an amino acid sequence to a protein sequence database (blastp), comparing a nucleotide query sequence to a nucleotide sequence database (blastn), and comparing a nucleotide sequence translated in all reading frames to a protein sequence database (blastx).

Examination Guidelines: From the main BLAST page, users can choose among several NCBI services. For service descriptions, click on the question mark to the right of each section title or see the Description of BLAST Services. Clicking on the desired BLAST search option will lead to a search page with a box for entering the query sequence. Accepted input includes a sequence in FASTA format (a single-line description followed by sequence data), bare sequence (sequence data without the single-line description), and identifier. The identifier may be an accession number or GenBank ID (GI number), but must be entered as a single word without any spaces between characters. For more information about input, see NCBI's Search Format page. Each search or format option on the search page links to Help documentation with more detailed descriptions of each option. For more on how to use BLAST, see our Sequence Similarity Searching tutorial and NCBI's step-by-step BLAST GUIDE, Query Tutorial for new users, BLAST Tutorial, and BLAST Help.

Info Providing: After submitting a BLAST request, users are presented with a Formatting BLAST page that displays the query statement, domain information, request for ID number, and format options. After desired format options are selected, pressing the Format button will pull up the Results of BLAST page. Using pair-wise alignment (the default alignment view) in format options, the Results page will display an image map graphically depicting retrieved database sequences (subject sequences) aligned with query sequence (depicted as the numbered line at the top). Passing the mouse over each line below the query sequence will display a description of that sequence in the text box. Clicking on each line will jump down to the corresponding pairwise alignment between the query sequence and a particular subject sequence. Below the image map is a list of sequences producing significant alignments. Accession number or identifier for each alignment links to a sequence record. The score links to the corresponding pairwise alignment at the bottom of the Results page. The blue L seen in some results links to a related entry in LocusLink. See the Sequence Similarity Searching tutorial for more on interpreting BLAST results.

PIR FASTA Similarity Search

Indication: The FASTA Similarity Search tool is part of the Protein Information Resource (PIR) collection of protein databases and bioinformatics tools. This similarity-search tool uses the FASTA algorithm, which compares a query sequence to those in the Protein Sequence Database and other PIR databases.

Examination Instructions: Users can query the database by inserting the single-letter amino acid code into the query box or by entering the valid PIR-PSD entry code for a particular protein of interest. See the Demo Search for an example.

Info Providing: Query results are presented in a table that lists more-similar sequences at the top and less-similar sequences toward the bottom. Clicking on ID number for a result will pull up the database entry for that protein, and clicking on the colored bar on the right will link to pairwise alignment between the submitted sequence and the subject sequence retrieved from the database.

View Article Sources : http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/geneguide.shtml#compare

14 Mar 2012

Resources for Sequence Similarity Searching

No comments:

Post a Comment

Blog Archive