SWISS-PROT protein database is constructed by Amos Bairoch of University of Geneva. SWISS-PROT also contains sequences translated from the EMBL Nucleotide Sequence Database.

Example

ID   TNFA_HUMAN     STANDARD;      PRT;   233 AA.
AC   P01375;
DT   21-JUL-1986 (REL. 01, CREATED)
DT   21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT   01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE)
DE   TUMOR NECROSIS FACTOR PRECURSOR (TNF-ALPHA) (CACHECTIN).
GN   TNFA.
OS   HOMO SAPIENS (HUMAN).
OC   EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;
OC   EUTHERIA; PRIMATES.
RN   [1]
RP   SEQUENCE FROM N.A.
RM   87217060
RA   NEDOSPASOV S.A., SHAKHOV A.N., TURETSKAYA R.L., METT V.A.,
RA   AZIZOV M.M., GEORGIEV G.P., KOROBKO V.G., DOBRYNIN V.N.,
RA   FILIPPOV S.A., BYSTROV N.S., BOLDYREVA E.F., CHUVPILO S.A.,
RA   CHUMAKOV A.M., SHINGAROVA L.N., OVCHINNIKOV Y.A.;
RL   COLD SPRING HARB. SYMP. QUANT. BIOL. 51:611-624(1986).
RN   [2]
RP   SEQUENCE FROM N.A.
RM   85086244
RA   PENNICA D., NEDWIN G.E., HAYFLICK J.S., SEEBURG P.H., DERYNCK R.,
RA   PALLADINO M.A., KOHR W.J., AGGARWAL B.B., GOEDDEL D.V.;
RL   NATURE 312:724-729(1984).
RN   [3]
RP   SEQUENCE FROM N.A.
RM   85137898
RA   SHIRAI T., YAMAGUCHI H., ITO H., TODD C.W., WALLACE R.B.;
RL   NATURE 313:803-806(1985).
CC   -!- FUNCTION: CYTOKINE WITH A WIDE VARIETY OF FUNCTIONS: IT CAN
CC       CAUSE CYTOLYSIS OF CERTAIN TUMOR CELL LINES, IT IS IMPLICATED
CC       IN THE INDUCTION OF CACHEXIA, IT IS A POTENT PYROGEN CAUSING
CC       FEVER BY DIRECT ACTION OR BY STIMULATION OF IL-1 SECRETION, IT
CC       CAN STIMULATE CELL PROLIFERATION & INDUCE CELL DIFFERENTIATION
CC       UNDER CERTAIN CONDITIONS.
CC   -!- SUBUNIT: HOMOTRIMER.
CC   -!- SUBCELLULAR LOCATION: TYPE II MEMBRANE PROTEIN. ALSO EXISTS AS
CC       AN EXTRACELLULAR SOLUBLE FORM.
CC   -!- PTM: THE SOLUBLE FORM DERIVES FROM THE MEMBRANE FORM BY
CC       PROTEOLYTIC PROCESSING.
CC   -!- DISEASE: CACHEXIA ACCOMPANIES A VARIETY OF DISEASES, INCLUDING
CC       CANCER AND INFECTION, AND IS CHARACTERIZED BY GENERAL ILL
CC       HEALTH AND MALNUTRITION.
CC   -!- SIMILARITY: BELONGS TO THE TUMOR NECROSIS FACTOR FAMILY.
DR   EMBL; X02910; HSTNFA.
DR   EMBL; M16441; HSTNFAB.
DR   EMBL; X01394; HSTNFR.
DR   EMBL; M10988; HSTNFAA.
DR   PIR; B23784; QWHUN.
DR   PIR; A44189; A44189.
DR   PDB; 1TNF; 15-JAN-91.
DR   PDB; 2TUN; 31-JAN-94.
DR   MIM; 191160; 11TH EDITION.
DR   PROSITE; PS00251; TNF.
KW   CYTOKINE; CYTOTOXIN; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL-ANCHOR;
KW   MYRISTYLATION; 3D-STRUCTURE.
FT   PROPEP        1     76
FT   CHAIN        77    233       TUMOR NECROSIS FACTOR.
FT   TRANSMEM     36     56       SIGNAL-ANCHOR (TYPE-II PROTEIN).
FT   LIPID        19     19       MYRISTATE.
FT   LIPID        20     20       MYRISTATE.
FT   DISULFID    145    177
FT   MUTAGEN     108    108       R->W: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     112    112       L->F: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     162    162       S->F: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     167    167       V->A,D: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     222    222       E->K: BIOLOGICALLY INACTIVE.
FT   CONFLICT     63     63       F -> S (IN REF. 5).
SQ   SEQUENCE   233 AA;  25644 MW;  279986 CN;
     MSTESMIRDV ELAEEALPKK TGGPQGSRRC LFLSLFSFLI VAGATTLFCL LHFGVIGPQR
     EEFPRDLSLI SPLAQAVRSS SRTPSDKPVA HVVANPQAEG QLQWLNRRAN ALLANGVELR
     DNQLVVPSEG LYLIYSQVLF KGQGCPSTHV LLTHTISRIA VSYQTKVNLL SAIKSPCQRE
     TPEGAEAKPW YEPIYLGGVF QLEKGDRLSA EINRPDYLDF AESGQVYFGI IAL
//

ID
This line is always the first line of an entry. This line consists of the entry name, data class, the word 'PRT' which means molecule type (PRoTein), and sequence length. The entry name is symbolized as X_Y, where X is a mnemonic code representing the protein name, and Y is a mnemonic species identification code. There are two data classes available,

AC
This line lists the accession numbers associated with an entry. Entries will have more than one accession number if they have been merged or split.

DT
These lines show the date of entry or last modification of the sequence entry.

DE
These lines contain general descriptive free-format information about the sequence stored.

GN
This line contains the name(s) of the gene(s) that encode for the stored protein sequence. In the case that more than one name has been assigned to an individual locus, the synonyms will be listed separating by the word `OR'. In the case that multiple genes encode for an identical protein, all the different gene names will be listed separating by the word `AND'.

KW
These lines provide information which can be used to generate cross-reference indexes of the sequence entries based on functional, structural, or other categories.

OS, OG, OC
These fields contain information about source organism.
OS
This line specifies the organism(s) which was the source of the stored sequence. The species designation consists of the Latin genus and species designation followed by the English name (in parentheses).
OG
This line indicates if the gene coding for a protein originates from the organelle such as mitochondria, the chloroplast, a cyanelle, or a plasmid.
OC
These lines contain taxonomic classification of the source organism. The classification is listed top-down as nodes in a taxonomic tree in which the most general grouping is given first.

RN, RP, RC, RM, RA, RL
These fields comprise the literature citations within SWISS-PROT. The citations indicate the papers from which the data has been abstracted.
RN
This line gives a sequential number to each reference citation in an entry.
RP
This line describes the extent of the work carried out by the authors of the reference cited.
RC
This lines are are used to store comments relevant to the reference cited. The format is 'TOKEN1=TEXT; TOKEN2=TEXT; ... ', where the currently defined tokens are PLASMID, SPECIES, STRAIN, TISSUE, and TRANSPOSON. The `SPECIES' token is only used when an entry describes a sequence which is identical in more than one species; similarly the `PLASMID' is only used if an entry describes a sequence identical in more than one plasmid.
RM
This line indicates the Medline Unique ID of a reference.
RA
These lines list the authors of the paper (or other work) cited. All of the authors are included, and are listed in the order given in the paper.
RL
These lines contain the conventional citation information for the reference like below.
  • Journal citations
  • Book citations
  • Unpublished results
  • Unpublished observations
  • Thesis
  • Patent applications
  • Submissions
When a reference is made to a paper which is `in press' at the time when the data bank is released, the page range, and eventually the volume number are indicated as '0' (zero).

DR
These lines are used as pointers to information related to SWISS-PROT entries and found in data collections other than SWISS-PROT. Each line has a database identifier and entry IDs in it.

FT
This table describes regions or sites of interest in the sequence. In general the feature table lists post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also included in the feature table. Each line of this table consists of feature key, `FROM' and `TO' endpoint specifications, and optionally a description which contains additional information about the feature.

SQ
This line marks the beginning of the sequence data and gives a quick summary of its content, which are sequence length (AA), molecular weight (MW), and checking number (CN). For checking number, please refer to:
Bairoch A., Biochem. J. 203:527-528(1982).

CC
These lines are free text comments on the entry, and may be used to convey any useful information. A major proportion of the comment blocks, each of which is started with a mark `-!-', are arranged according to what we designate as 'topics`. See also topics table.