Protein Sequence Databases

Entrez Proteins

Indication: Part of the National Center for Biotechnology Information (NCBI) Entrez system, this database includes sequence data compiled from a variety of sources, including Swiss-Prot, Protein Information Resource (PIR), Protein Data Bank (PDB), and Protein Resource Foundation (PRF) in Japan. Some protein sequences were created from translations of coding regions in DNA sequences stored in GenBank and RefSeq.

Examination Guidelines:: As with other Entrez databases, users can refine search strategies using fields available in Limits, preview the number of search results for a query, browse Index terms of a particular field, combine searches using History, and store selected records from different searches on Clipboard. Some of the indexed fields that can be used to narrow a search include accession number, gene name, molecular weight, organism, properties, protein name, and sequence length. Users also can specify that only one particular database be searched (e.G., retrieve protein sequences from Swiss-Prot only). Boolean Operators AND, OR, and NOT must be in upper case. Phrase searching using double quotes and truncation using the asterisk (*) as a wild card also are supported. For more information about searching this and other NCBI Entrez databases, see the Entrez Help Document. For step-by-step instructions on finding and interpreting sequence records, see our tutorial on accessing sequence records.

Information Provided: Search results displayed using the default view will include locus name (a unique name assigned to each record), sequence length, protein description (definition), accession number, database source, keywords, organism, citations to references, comments concerning protein function or associated traits or disorders, information about sequence regions of biological significance, and the amino acid sequence. For detailed descriptions about fields presented in each NCBI sequence record, see the GenBank sample record.

Swiss-Prot/TrEMBL

Indication: The protein sequence databases Swiss-Prot and TrEMBL were developed by groups at the Swiss Institute of Bioinformatics (SIB) and the European Bioinformatics Institute (EBI). Swiss-Prot uses three key criteria: High level of annotation, minimal redundancy, and high level of integration with other databases. Swiss-Prot includes as much information as possible in its annotations, and external experts review current literature and provide comments and updates on different protein groups. Swiss-Prot's depth of annotation, however, requires considerable time and effort. To keep a current database of protein sequences, a subset called TrEMBL (Translation of EMBL) was developed. Translations of nucleotide sequences from EMBL (European Molecular Biology Laboratory) databases are computer annotated and stored in TrEMBL until sequences can be fully annotated and integrated into Swiss-Prot.

Examination Guidelines:Swiss-Prot sequence records can be accessed through the NCBI Entrez Proteins database. If users choose to access the Swiss-Prot/TrEMBL Web site for sequence searching, they can query the database using a variety of methods: Quick search on the main page (Boolean operators not supported), Sequence Retrieval System (SRS), full-text search (Boolean operators, phrase searching, and wild cards supported), and advanced search. Forms for searching by accession number or ID, description (entry name, gene name, species, organelle), author, or citation also are provided. To learn more about searching Swiss-Prot see the Swiss-Prot Documentation section which includes a downloadable PDF version of the user manual.

Information Provided: Swiss-Prot entries are described as containing two types of data: Core data (consisting of sequence, bibliographic references, and description of the protein's biological origin) and the annotation. Detailed annotations in each entry describe protein function, post-translational modification (e.G., addition of sugars or phosphate groups after mRNA translation), domain and binding sites, secondary structure, quaternary structure (e.G., homodimer, heterodimer), disorders associated with altered protein forms or amounts, variants, and similarities to other proteins.

Protein Information Resource - Protein Sequence Database (PIR-PSD)

Indication: Established in 1984, Protein Information Resource (PIR) is a division of the National Biomedical Research Foundation associated with Georgetown University Medical Center. In collaboration with Munich Information Center for Protein Sequences (MIPS) in Germany and the Japan International Protein Information Database (JIPID), PIR has developed the PIR-International Protein Sequence Database (PSD). Its mission is to be "the most comprehensive and expertly annotated protein sequence database in the public domain" with the primary objective of achieving "properties of Comprehensiveness, Timeliness, Non-Redundancy, Quality Annotation, and Full Classification."

Examination Guidelines: PIR sequence records can be accessed through the NCBI Entrez Proteins database. If users choose to go to the PIR-PSD Web site, the following search options are provided: Search by unique identifier or accession number, basic text search, and advanced text search. For basic text searches, the Boolean operators AND, OR, and NOT are not supported, and a space between terms is interpreted as "and." Advanced searches allow users to refine a strategy with fields such as Title, Species, Author, Keyword, and Gene Name. In advanced search, search terms are case sensitive and must be at least three characters long. Boolean operators OR and NOT are supported. A space between words is interpreted as "and," so users searching for a phrase must put a character between multiple terms (e.G., enter homo-sapiens to search for "homo sapiens"). For more on searching PIR-PSD, see Help Searching PIR Databases, Sample Entry, Demo Search, and FAQs.

Information Provided: Each record includes protein name; classification and origin; literature references; protein features such as domains and motifs; primary sequence data; and links to related entries in other databases. Users have the option to create submission forms for similarity searching in PIR and NCBI databases. At the top of each record are links to annotation and sequence data within the record and a link to a composition table that summarizes total amino acid composition expressed as percentages. At the bottom of the record are direct links to Protein Data Bank (PDB) structures and sequence similarity alignments associated with the protein.

View Article Sources : HTTP://WWW.ornl.gov

14 Mar 2012

Protein Sequence Databases

No comments:

Post a Comment

Blog Archive