14 Mar 2012

Nucleotide Sequence Databases

This collaboration is a coordinated effort among three key sequence repository centers: GenBank at the National Center for Biotechnology (NCBI), the European Molecular Biology Laboratory (EMBL), and the DNA Data Bank of Japan (DDBJ). Sequence data is exchanged daily among these three organizations. Although record formats and search systems may differ, information contained in each record (accession number, sequence data, annotations) will be the same for all three databases.

Indication: GenBank is an NCBI database that serves as an archive for all publicly available DNA sequences from more than 100,000 different organisms. Submitting scientists retain complete editorial control over their sequences, so they decide on gene symbols (which may not be the official ones) and what additional information to include. Scientists contact NCBI if they wish to make any modifications to their sequence records. As an archival database, GenBank can include redundant entries, even hundreds of records for the same gene, and some entries may contain errors in their sequence data. To address some problems associated with this archival database, NCBI developed the nonredundant RefSeq. RefSeq is a curated, nonredundant source of sequence data for genomic DNA, mRNA transcripts, and proteins of major research organisms. Unlike GenBank records, RefSeq records are created, reviewed, and updated by NCBI staff. Each RefSeq entry features a distinct accession number (two characters followed by an underscore in which the first two characters describe the sequence type). For more information about RefSeq, see RefSeq FAQs.

Exploration Tips: There are a few different ways for accessing sequence records at NCBI: Text-searching with Entrez Nucleotide, BLAST searching, or linking to sequence records from databases and tools such as LocusLink, OMIM, or Map Viewer. Entrez Nucleotide is a part of NCBI's Entrez search and retrieval system that can be used to search several linked databases, such as sequence databases, structure databases, OMIM, genome assemblies, and biomedical literature. With all Entrez databases, users can refine search strategies using fields available in Limits and Preview/Index, browse Index terms of a particular field, combine searches using History, and store selected records from different searches on a Clipboard. Some search-refining techniques available from the Limits page are to exclude certain types of sequences (e.G., ESTs) and limit the search by date or particular database (e.G., search only RefSeq). Boolean Operators AND, OR, and NOT must be in upper case. Phrase searching using double quotes and truncation using the asterisk (*) as a wild card also are supported. For more information about searching this and other NCBI Entrez databases, see Entrez Help Document. For step-by-step instructions on finding and interpreting sequence records, see our tutorial Accessing records in NCBI sequence databases.

Info Provided: Each record returned in a search will include the nucleotide sequence and annotations such as accession numbers, keywords, source organism, and citations for references. Sequence records also may contain the translated amino acid sequence. For more detailed descriptions of types of information in each sequence record, check the Sample GenBank Record provided by NCBI.

View Article Sources: HTTP://WWW.ornl.gov

