Homepage

Belozersky Institute

GeneBee

Russian EMBnet Node

HELP ON SCREENING BY KEYWORDS

REFERENCES:


GeneBee's SCREENING BY KEYWORDS allows to use three possibilities for screening databanks (SWISSPROT, EMBL and PDB) for keywords:

    • to search for databank entries with query keywords presented in entire info texts of entries
      (proteinase OR glycoprotein)
    • to search for databank entries with query keywords presented in one of the following
      fields: ID, DE, KW, OS and OC (DE[proteinase] OR KW[glycoprotein] )
    • to search for any pattern of symbols in entire info text of entries
      ( XX[E.coli]) - but mind that this the most time-consuming option.

Some definitions:

  • All DATABANKS are build of ENTRIES.
  • Every ENTRY describes one sequence or 3D-structure and consists of a number of FIELDS.
  • GeneBee's SCREENING BY KEYWORDS allows to use three possibilities for screening databanks (SWISSPROT, EMBL and PDB) for keywords:

  • to search for databank entries with query keywords presented in one of the following fields: ID, DE, KW, OS and OC ( DE[proteinase] OR KW[glycoprotein] );
  • to search for databank entries with query keywords presented in entire info texts of entries ( proteinase OR glycoprotein );
  • to search for any pattern of symbols in entire info text of entries ( XX[E.coli]) - but mind that this the most time-consuming option.

SYNTAX

Please take into account, that keyword, by our definition, is a combination of symbols. It starts with a letter and is separated by space, period, coma, asterisk, etc . Thus, E.coli, for example, is not considered to be a keyword (because there is a period inside). Therefore you have to use XX option for screening for such a combination.

Any of these options allows you to use three logical operators - AND (or space), OR, ANDNOT.

  • If you want to select the entries that contain ALL of your query words in definite fields, use logical operator AND (or put spaces between words).

      Examples:

      • DE[proteinase] AND cleave ("proteinase" should be in DE field of selected entry; "cleave" in any of text info fields of selected entries);
      • proteinase AND cleave ( both should be in ALL text info fields of selected entries).
  • If you want to select all entries that contain at least ONE out of two or more keywords (or symbol patterns) in the same entry, use logical operator OR.

         Examples:
      • (DE[proline] DE[endopeptidase]) OR (KW[proteinase] AND DE[cleave]) (again, this is for presence of every word in definite text info fields: in this case DE and KW);
      • XX[trypsin] OR (glycoprotein lipocalin) (and this is a query for presence of words in any text info fields).
  • If you want to select all entries that contain ONE keyword (or a group of keywords), and not specified others, use logical operator ANDNOT.

      Examples:

      • proteinase ANDNOT cleave
      • XX[A$b5] ANDNOT cleave

DATABANKS

Protein databanks:
  1. SWISSPROT - databank of protein sequences
  2. TrEMBL - databank of coding sequences (CDSs) in the EMBL Nucleotide Sequence Database
  3. PDB - sequences from databank of biopolymer 3D structures,


Nucleotide databank EMBL:

  1. primates,
  2. rodentes,
  3. mammalians, except primates and rodents,
  4. vertebrates, except mammalians,
  5. invertebrates,
  6. viruces,
  7. synthetic plasmids,
  8. plants, including fungies,
  9. expression tags,
  10. patents,
  11. bacteries,
  12. unannotated sequences,

FIELDS

All three databanks implemented in GeneBee represent their entries in different fields.
GeneBee makes names of these fields uniform in order to preserve coherence. Most of the GeneBee field notations follow SWISSPROT abbreviations.

Abbreviations of the entry fields in SWISSPROT databank fields mean (the ones marked with red are used in GeneeBee outputs) the following:

ID
AC
DT
DE

GN
OS
OG
OC
RN
RP
RC
RX
RA
RL
CC
DR
KW
FT
SQ
//
- Identification.
- Accession number(s).
- Date.
- Description.

- Gene name(s).
- Organism species.
- Organelle.
- Organism classification.
- Reference number.
- Reference position.
- Reference comments.
- Reference cross-references.
- Reference authors.
- Reference location.
- Comments or notes.
- Database cross-references.
- Keywords.
- Feature table data.
- Sequence header. - (blanks) sequence data.
- Termination line.

All information that is relevant to the fields GN, OG, RN, RP, RC, RX, RA and DR appears in the field CC in GeneBee outputs.

Abbreviations of the entry fields in PDB databank fields correspond with the following GeneBee
fields (see brackets) and mean the following:

HEADER:
COMPND:
SOURCE:
EXPDATA:
REVDAT:
REFERENCE:
JORNAL:
AUTHOR:
TITLE:
REFN:
REMARK:
SEQRES:
HET:
FORMUL:
HELIX:
SHEET:
SSBOND:
CRYSTAL:
ORIGINAL:
SCALE:
END:
(CC)
(DE)
(OS)
(CC)
(DT)
(RL)
(RL)
(RA)
(RT)
(RL)
(RL)
(CC)
(FT)
(FT)
(FT)
(FT)
(FT)
(FT)
(FT)
(FT)
(//)


PROMPTS
  • The result can be sent to you by E-mail along with it's WWW demonstration.
  • This prompt allows you to receive sequences for your queries. However, bear in mind, that such request makes processing much longer.
  • This prompt brings you 3D coordinates, that are obtained from PDB databank.
  • You may specify 1 or 2 of the three major databanks. All of them will be screened by default.
  • EMBL has indexed nucleotide databanks - by organisms and and some technical qualifiers. You might prefer to specify some of them. Otherwise all of them would be scanned by default.

EXAMPLES

These are slightly shortened examples of the same entry presented in SWISSPROT, PDB, EMBL and GeneBee outputs.

SWISSPROT

See SWISSPROT file decription

    ID IPST_HUMAN STANDARD; PRT; 79 AA. 
    AC P00995; 
    DT 21-JUL-1986 (REL. 01, CREATED) 
    DT 01-MAR-1989 (REL. 10, LAST SEQUENCE UPDATE) 
    DT 01-JUN-1996 (REL. 34, LAST ANNOTATION UPDATE) 
    DE PANCREATIC SECRETORY TRYPSIN INHIBITOR PRECURSOR (TUMOR-ASSOCIATED 
    DE TRYPSIN INHIBITOR) (TATI). 
    GN SPINK1 OR PSTI. 
    OS HOMO SAPIENS (HUMAN). 
    OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA; 
    OC EUTHERIA; PRIMATES. 
    RN [1] 
    RP SEQUENCE FROM N.A. 
    RX MEDLINE; 88106485. 
    RA HORII A., KOBAYASHI T., TOMITA N., YAMAMOTO T., FUKUSHIGE S.,
    RA MUROTSU T., OGAWA M., MORI T., MATSUBARA K.; 
    RL BIOCHEM. BIOPHYS. RES. COMMUN. 149:635-641(1987). 
    RN [2] 
    RP SEQUENCE FROM N.A. 
    RX MEDLINE; 86050645. 
    RA YAMAMOTO T., NAKAMURA Y., NISHIDE T., EMI M., OGAWA M., MORI T., RAMATSUBARA K.; 
    RL BIOCHEM. BIOPHYS. RES. COMMUN. 132:605-612(1985). 
    RN [3] RP SEQUENCE FROM N.A. 
    RX MEDLINE; 88083571. 
    RA TOMITA N., HORII A., YAMAMOTO T., OGAWA M., MORI T., MATSUBARA K.;
    RL FEBS LETT. 225:113-119(1987). 
    RN [4] 
    RP SEQUENCE OF 24-79. 
    RX MEDLINE; 77133145. 
    RA BARTELT D.C., SHAPANKA R., GREENE L.J.; 
    RL ARCH. BIOCHEM. BIOPHYS. 179:189-199(1977). 
    RN [5] RP SEQUENCE OF 24-46. 
    RX MEDLINE; 83056875. 
    RA HUHTALA M.-L., PESONEN K., KALKKINEN N., STENMAN U.-H. 
    RL J. BIOL. CHEM. 257:13713-13716(1982). 
    RN [6] 
    RP X-RAY CRYSTALLOGRAPHY (2.3 ANGSTROMS). 
    RX MEDLINE; 92309406. 
    RA HECHT H.J., SZARDENINGS M., COLLINS J., SCHOMBURG D.; 
    RL J. MOL. BIOL. 225:1095-1103(1992). 
    RN [7] 
    RP STRUCTURE BY NMR OF VARIANT WITH LEU-41 AND ARG-44. 
    RX MEDLINE; 93164251. 
    RA KLAUS W., SCHOMBURG D.; 
    RL J. MOL. BIOL. 229:695-706(1993). 
    CC -!- FUNCTION: THIS IS A TRYPSIN INHIBITOR, ITS PHYSIOLOGICAL FUNCTION 
    CC IS TO PREVENT THE TRYPSIN-CATALYSED PREMATURE ACTIVATION OF 
    CC ZYMOGENS WITHIN THE PANCREAS. 
    CC -!- SIMILARITY: TO OTHER KAZAL TYPE INHIBITORS. 
    DR EMBL; M20530; G190694; -. 
    DR EMBL; M22971; G190694; JOINED. 
    DR EMBL; M20528; G190694; JOINED. 
    DR EMBL; M20529; G190694; JOINED. 
    DR EMBL; Y00705; G35766; -. 
    DR EMBL; M11949; G190688; -. 
    DR PIR; A01229; TIHUA. 
    DR PIR; A27484; A27484. 
    DR PIR; S02605; S02605. 
    DR HSSP; P00998; 1CGI. 
    DR MIM; 167790; 11TH EDITION. 
    DR PROSITE; PS00282; KAZAL. 
    KW SERINE PROTEASE INHIBITOR; SIGNAL. 
    FT SIGNAL 1 23 
    FT CHAIN 24 79 PANCREATIC SECRETORY TRYPSIN INHIBITOR. 
    FT DISULFID 32 61 
    FT DISULFID 39 58 
    FT DISULFID 47 79 
    FT ACT_SITE 41 42 REACTIVE BOND. 
    FT CONFLICT 44 44 D -> N (IN REF. 4 AND 5). 
    FT CONFLICT 52 52 N -> D (IN REF. 4). 
    FT CONFLICT 64 64 N -> G (IN REF. 3). 
    SQ SEQUENCE 79 AA; 8507 MW; EF30BB47 CR C32; 
       MKVTGIFLLS ALALLSLSGN TGADSLGREA KCYNELNGCT KIYDPVCGTD GNTYPNECVL 
       CFENRKRQTS ILIQKSGPC 
    // 
    

EMBL

See EMBL file decription

    ID   HSPSTI01   standard; RNA; HUM; 432 BP.
    XX
    AC   M11949;
    XX
    XX
    XX
    DT   16-JUL-1988 (Rel. 16, Created)
    DT   06-JUL-1989 (Rel. 20, Last updated, Version 1)
    XX
    DE   Human pancreatic secretory trypsin inhibitor (PSTI) mRNA, complete
    DE   cds.
    XX
    KW   .
    XX
    OS   Homo sapiens (human)
    OC   Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates;
    OC   Catarrhini; Hominidae; Homo.
    XX
    RN   [1]
    RP   1-432
    RX   MEDLINE; 86050645.
    RA   Yamamoto T., Nakamura Y., Nishide T., Emi M., Ogawa M., Mori T.,
    RA   Matsubara K.;
    RT   "Molecular cloning and nucleotide sequence of human pancreatic secretory
    RT   trypsin inhibitor (PSTI) cDNA";
    RL   Biochem. Biophys. Res. Commun. 132:605-612(1985).
    XX
    DR   SWISS-PROT; P00995; IPST_HUMAN.
    XX
    FH   Key             Location/Qualifiers
    FH
    FT   source          1..432
    FT                   /organism="Homo sapiens"
    FT   CDS             117..356
    FT                   /db_xref="PID:g190688"
    FT                   /db_xref="SWISS-PROT:P00995"
    FT                   /note="pancreatic secretory trypsin inhibitor"
    FT                   /translation="MKVTGIFLLSALALLSLSGNTGADSLGREAKCYNELNGCTKIYDP
    XX
    FT                   VCGTDGNTYPNECVLCFENRKRQTSILIQKSGPC"
    XX
    SQ   Sequence 432 BP; 121 A; 96 C; 103 G; 112 T; 0 other;
          ccatggaagt cggaatccgc taaggagtgt gtaacaactc acctgccgaa tcaacaagag
          agacgtggta agtgcggtgc agttttcaac tgacctctgg acgcagaact tcagccatga
          aggtaacagg catctttctt ctcagtgcct tggccctgtt gagtctatct ggtaacactg
          gagctgactc cctgggaaga gaggccaaat gttacaatga acttaatgga tgcaccaaga
          tatatgaccc tgtctgtggg actgatggaa atacttatcc caatgaatgc gtgttatgtt
          ttgaaaatcg gaaacgccag acttctatcc tcattcaaaa atctgggcct tgctgagaac
          caaggttttg aaatcccatc aggtcaccgc gaggcctgac tggccttatt gttgaataaa
          tgtatctgaa ta

    //
    

PDB

See PDB Format Description
    HEADER    SERINE PROTEASE INHIBITOR               27-MAR-92     1HPT   1HPT   2 
    COMPND    HUMAN PANCREATIC SECRETORY TRYPSIN INHIBITOR VARIANT 3       1HPT   3 
    SOURCE    HUMAN (HOMO SAPIENS) PURIFIED FROM PMAMPF-PSTI3 TRANSDUCED   1HPT   4 
    SOURCE   2 (ESCHERICHIA COLI)                                          1HPT   5 
    AUTHOR    H.J.HECHT,M.SZARDENINGS,J.COLLINS,D.SCHOMBURG                1HPT   6 
    REVDAT   1    31-OCT-93 1HPT 0                                         1HPT   7 
    JRNL       AUTH    H.J.HECHT,M.SZARDENINGS,J.COLLINS,D.SCHOMBURG       1HPT   8 
    JRNL       TITL   THREE-DIMENSIONAL STRUCTURE OF A RECOMBINANT         1HPT   9 
    JRNL       TITL 2 VARIANT OF HUMAN PANCREATIC SECRETORY TRYPSIN        1HPT  10
    JRNL       TITL 3 INHIBITOR (KAZAL TYPE)                               1HPT  11 
    JRNL       REF    J.MOL.BIOL. V. 225 1095 1992                         1HPT  12
    JRNL       REFN   ASTM JMOBAK UK ISSN 0022-2836 070                    1HPT  13 
    REMARK   1                                                             1HPT  14 
    REMARK   2                                                             1HPT  15 
    REMARK   2 RESOLUTION. 2.3 ANGSTROMS.                                  1HPT  16 
    REMARK   3                                                             1HPT  17 
    REMARK   3 REFINEMENT.                                                 1HPT  18 
    REMARK   3   PROGRAM X-PLOR                                            1HPT  19 
    REMARK   3   AUTHORS BRUNGER                                           1HPT  20 
    REMARK   3   R VALUE 0.191                                             1HPT  21 
    REMARK   3   RMSD BOND DISTANCES 0.011 ANGSTROMS                       1HPT  22 
    REMARK   3   RMSD BOND ANGLES 2.70 DEGREES                             1HPT  23 
    REMARK   3                                                             1HPT  24 
    REMARK   3   RESOLUTION RANGE 8.0 - 2.3 ANGSTROMS                      1HPT  25 
    REMARK   3   DATA CUTOFF 2.0 SIGMA(F)                                  1HPT  26
    REMARK   3                                                             1HPT  27 
    REMARK   3   NUMBER OF PROTEIN ATOMS 440                               1HPT  28 
    REMARK   3   NUMBER OF SOLVENT ATOMS 31                                1HPT  29 
    REMARK   4                                                             1HPT  30 
    REMARK   4 THE FIRST THREE RESIDUES OF THE INHIBITOR ARE DISORDERED.   1HPT  31 
    SEQRES   1     56 ASP SER LEU GLY ARG GLU ALA LYS CYS TYR ASN GLU LEU  1HPT  32 
    SEQRES   2     56 ASN GLY CYS THR TYR GLU TYR ARG PRO VAL CYS GLY THR  1HPT  33 
    SEQRES   3     56 ASP GLY ASP THR TYR PRO ASN GLU CYS VAL LEU CYS PHE  1HPT  34 
    SEQRES   4     56 GLU ASN ARG LYS ARG GLN THR SER ILE LEU ILE GLN LYS  1HPT  35 
    SEQRES   5     56 SER GLY PRO CYS                                      1HPT  36 
    FORMUL   2   HOH *31(H2 O1)                                            1HPT  37 
    HELIX    1 H1  GLU     34 ARG     44  1                                1HPT  38 
    SHEET    1 AI  3 VAL    23  GLY    25  0                               1HPT  39 
    SHEET    2 AI  3 THR    30  TYR    31 -1                               1HPT  40 
    SHEET    3 AI  3 ILE    50  SER    53 -1                               1HPT  41 
    SSBOND   1 CYS      9    CYS     38                                    1HPT  42 
    SSBOND   2 CYS     16    CYS     35                                    1HPT  43 
    SSBOND   3 CYS     24    CYS     56                                    1HPT  44 
    CRYST1   40.150   40.150   33.910  90.00  90.00  90.00 P 43         4  1HPT  45 
    ORIGX1      1.000000  0.000000  0.000000        0.00000                1HPT  46 
    ORIGX2      0.000000  1.000000  0.000000        0.00000                1HPT  47 
    ORIGX3      0.000000  0.000000  1.000000        0.00000                1HPT  48 
    SCALE1      0.024907  0.000000  0.000000        0.00000                1HPT  49 
    SCALE2      0.000000  0.024907  0.000000        0.00000                1HPT  50 
    SCALE3      0.000000  0.000000  0.029490        0.00000                1HPT  51 
    ATOM     1  N   ASP      1      19.929  22.213  27.289  1.00 58.24     1HPT  52 
    ATOM     2  CA  ASP      1      19.753  20.827  27.679  1.00 58.24     1HPT  53 
    ATOM     3  C   ASP      1      18.679  20.259  26.758  1.00 58.24     1HPT  54 
         ................. 
         ( NOT ALL    ATOM LINES ARE SHOWN HERE.) 
    TER    441      CYS     56                                             1HPT 492 
    HETATM 442  O   HOH     60      13.177  12.529  18.907  1.00  4.28     1HPT 493 
         ................. 
         ( NOT ALL HETATOM LINES ARE SHOWN HERE.) 
    CONECT  66   65  293                                                   1HPT 524 
    CONECT 121  120  272                                                   1HPT 525 
         ................. 
         ( NOT ALL CONNECT LINES ARE SHOWN HERE.) 
    MASTER      18    0    0    1    3    0    0    6  471    1    6    5  1HPT 530 
    END                                                                    1HPT 531
    

GeneBee

    ID   IPST_HUMAN     STANDARD;      PRT;    79 AA.
    AC   P00995;
    DT   21-JUL-1986 (REL. 01, CREATED)
    DT   01-MAR-1989 (REL. 10, LAST SEQUENCE UPDATE)
    DT   15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE)
    DE   PANCREATIC SECRETORY TRYPSIN INHIBITOR PRECURSOR (TUMOR-ASSOCIATED
    DE   TRYPSIN INHIBITOR) (TATI).
    GN   SPINK1 OR PSTI.
    OS   HOMO SAPIENS (HUMAN).
    OC   EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;
    OC   EUTHERIA; PRIMATES.
    RN   [1]
    RP   SEQUENCE FROM N.A.
    RX   MEDLINE; 88106485.
    RA   HORII A., KOBAYASHI T., TOMITA N., YAMAMOTO T., FUKUSHIGE S.,
    RA   MUROTSU T., OGAWA M., MORI T., MATSUBARA K.;
    RL   BIOCHEM. BIOPHYS. RES. COMMUN. 149:635-641(1987).
    RN   [2]
    RP   SEQUENCE FROM N.A.
    RX   MEDLINE; 86050645.
    RA   YAMAMOTO T., NAKAMURA Y., NISHIDE T., EMI M., OGAWA M., MORI T.,
    RA   MATSUBARA K.;
    RL   BIOCHEM. BIOPHYS. RES. COMMUN. 132:605-612(1985).
    RN   [3]
    RP   SEQUENCE FROM N.A.
    RX   MEDLINE; 88083571.
    RA   TOMITA N., HORII A., YAMAMOTO T., OGAWA M., MORI T., MATSUBARA K.; 
    RL   FEBS LETT. 225:113-119(1987).
    RN   [4]
    RP   SEQUENCE OF 24-79.
    RX   MEDLINE; 77133145.
    RA   BARTELT D.C., SHAPANKA R., GREENE L.J.;
    RL   ARCH. BIOCHEM. BIOPHYS. 179:189-199(1977).
    RN   [5]
    RP   SEQUENCE OF 24-46.
    RX   MEDLINE; 83056875.
    RA   HUHTALA M.-L., PESONEN K., KALKKINEN N., STENMAN U.-H.;
    RL   J. BIOL. CHEM. 257:13713-13716(1982).
    RN   [6]
    RP   X-RAY CRYSTALLOGRAPHY (2.3 ANGSTROMS).
    RX   MEDLINE; 92309406.
    RA   HECHT H.-J., SZARDENINGS M., COLLINS J., SCHOMBURG D.;
    RL   J. MOL. BIOL. 225:1095-1103(1992).
    RN   [7]
    RP   STRUCTURE BY NMR OF VARIANT WITH LEU-41 AND ARG-44.
    RX   MEDLINE; 93164251.
    RA   KLAUS W., SCHOMBURG D.;
    RL   J. MOL. BIOL. 229:695-706(1993).
    CC   -!- FUNCTION: THIS IS A TRYPSIN INHIBITOR, ITS PHYSIOLOGICAL FUNCTION 
    CC       IS TO PREVENT THE TRYPSIN-CATALYSED PREMATURE ACTIVATION OF
    CC       ZYMOGENS WITHIN THE PANCREAS.
    CC   -!- SIMILARITY: TO OTHER KAZAL TYPE INHIBITORS.
    DR   EMBL; M20530; G190694; -.
    DR   EMBL; M22971; G190694; JOINED.
    DR   EMBL; M20528; G190694; JOINED.
    DR   EMBL; M20529; G190694; JOINED.
    DR   EMBL; Y00705; G35766; -.
    DR   EMBL; M11949; G190688; -.
    DR   PIR; A01229; TIHUA.
    DR   PIR; A27484; A27484.
    DR   PIR; S02605; S02605.
    DR   HSSP; P00998; 1CGI.
    DR   MIM; 167790; -.
    DR   PROSITE; PS00282; KAZAL; 1.
    KW   SERINE PROTEASE INHIBITOR; SIGNAL.
    FT   SIGNAL        1     23
    FT   CHAIN        24     79       PANCREATIC SECRETORY TRYPSIN INHIBITOR.
    FT   DISULFID     32     61 
    FT   DISULFID     39     58
    FT   DISULFID     47     79
    FT   ACT_SITE     41     42       REACTIVE BOND.
    FT   CONFLICT     44     44       D -> N (IN REF. 4 AND 5).
    FT   CONFLICT     52     52       N -> D (IN REF. 4).
    FT   CONFLICT     64     64       N -> G (IN REF. 3).
    SQ   SEQUENCE   79 AA;  8507 MW;  EF30BB47 CRC32;
         MKVTGIFLLS ALALLSLSGN TGADSLGREA KCYNELNGCT KIYDPVCGTD GNTYPNECVL
         CFENRKRQTS ILIQKSGPC
   //

SWISS-PROT database format

ID
This line is always the first line of an entry. This line consists of the entry name, data class, the word 'PRT' which means molecule type (PRoTein), and sequence length. The entry name is symbolized as X_Y, where X is a mnemonic code representing the protein name, and Y is a mnemonic species identification code. There are two data classes available,
  • STANDARD ---- Data which are complete to the standards laid down by the SWISS-PROT data bank
  • PRELIMINARY - Data for which only the sequence and bibliographic information have been submitted to thorough checks
AC
This line lists the accession numbers associated with an entry. Entries will have more than one accession number if they have been merged or split.
DT
These lines show the date of entry or last modification of the sequence entry.
DE
These lines contain general descriptive free-format information about the sequence stored.
GN
This line contains the name(s) of the gene(s) that encode for the stored protein sequence. In the case that more than one name has been assigned to an individual locus, the synon yms will be listed separating by the word `OR'. In the case that multiple genes encode for an identical protein, all the different gene names will be listed separating by the word `AND'.
KW
These lines provide information which can be used to generate cross-reference indexes of the sequence entries based on functional, structural, or other categories.
OS, OG, OC
These fields contain information about source organism.
OS
This line specifies the organism(s) which was the source of the stored sequence. The species designation consists of the Latin genus and species designation followed by the English name (in parentheses).
OG
This line indicates if the gene coding for a protein originates from the organelle such as mitochondria, the chloroplast, a cyanelle, or a plasmid.
OC
These lines contain taxonomic classification of the source organism. The classification is listed top-down as nodes in a taxonomic tree in which the most general grouping is given first.
RN, RP, RC, RM, RA, RL
These fields comprise the literature citations within SWISS-PROT. The citations indicate the papers from which the data has been abstracted.
RN
This line gives a sequential number to each reference citation in an entry.
RP
This line describes the extent of the work carried out by the authors of the reference cited.
RC
This lines are are used to store comments relevant to the reference cited. The format is 'TOKEN1=TEXT; TOKEN2=TEXT; ... ', where the currently defined tokens are PLASMID, SPECIES, STRAIN, TISSUE, and TRANSPOSON. The `SPECIES' token is only used when an entry describes a sequence which is identical in more than one species; similarly the `PLASMID' is only used if an entry describes a sequence identical in more than one plasmid.
RM
This line indicates the Medline Unique ID of a reference.
RA
These lines list the authors of the paper (or other work) cited. All of the authors are included, and are listed in the order given in the paper.
RL
These lines contain the conventional citation information for the reference like below.
  • Journal citations
  • Book citations
  • Unpublished results
  • Unpublished observations
  • Thesis
  • Patent applications
  • Submissions

When a reference is made to a paper which is `in press' at the time when the data bank is released, the page range, and eventually the volume number are indicated as '0' (zero).

DR
These lines are used as pointers to information related to SWISS-PROT entries and found in data collections other than SWISS-PROT. Each line has a database identifier and entry IDs in it.
FT
This table describes regions or sites of interest in the sequence. In general the feature table lists post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also included in the feature table. Each line of this table consists of feature key, `FROM' and `TO' endpoint specifications, and optionally a description which contains additional information about the feature.
  • If the `FROM' and `TO' specifications are equal, the feature indicated consists of the single amino acid at that position.
  • When a feature is known to extend beyond the end(s) of the sequenced region, the endpoint specification will be preceded by for features which continue to the right end (C-terminal direction).
  • Unknown endpoints are denoted by `?'.
SQ
This line marks the be ginning of the sequence data and gives a quick summary of its content, which are sequence length (AA), molecular weight (MW), and checking number (CN). For checking number, please refer to:
Bairoch A., Biochem. J. 203:527-528(1982).
CC
These lines are free text comments on the entry, and may be used to convey any useful information. A major proportion of the comment blocks, each of which is started with a mark `-!-', are arranged according to what we designate as 'topics`. See also topics table.
Back to KEYWORD SEARCH page