SWISS-PROT protein database is constructed by Amos Bairoch of University of
Geneva. SWISS-PROT also contains sequences translated from the EMBL Nucleotide
Sequence Database.
Example
ID   TNFA_HUMAN     STANDARD;      PRT;   233 AA.
AC   P01375;
DT   21-JUL-1986 (REL. 01, CREATED)
DT   21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT   01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE)
DE   TUMOR NECROSIS FACTOR PRECURSOR (TNF-ALPHA) (CACHECTIN).
GN   TNFA.
OS   HOMO SAPIENS (HUMAN).
OC   EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;
OC   EUTHERIA; PRIMATES.
RN   [1]
RP   SEQUENCE FROM N.A.
RM   87217060
RA   NEDOSPASOV S.A., SHAKHOV A.N., TURETSKAYA R.L., METT V.A.,
RA   AZIZOV M.M., GEORGIEV G.P., KOROBKO V.G., DOBRYNIN V.N.,
RA   FILIPPOV S.A., BYSTROV N.S., BOLDYREVA E.F., CHUVPILO S.A.,
RA   CHUMAKOV A.M., SHINGAROVA L.N., OVCHINNIKOV Y.A.;
RL   COLD SPRING HARB. SYMP. QUANT. BIOL. 51:611-624(1986).
RN   [2]
RP   SEQUENCE FROM N.A.
RM   85086244
RA   PENNICA D., NEDWIN G.E., HAYFLICK J.S., SEEBURG P.H., DERYNCK R.,
RA   PALLADINO M.A., KOHR W.J., AGGARWAL B.B., GOEDDEL D.V.;
RL   NATURE 312:724-729(1984).
RN   [3]
RP   SEQUENCE FROM N.A.
RM   85137898
RA   SHIRAI T., YAMAGUCHI H., ITO H., TODD C.W., WALLACE R.B.;
RL   NATURE 313:803-806(1985).
CC   -!- FUNCTION: CYTOKINE WITH A WIDE VARIETY OF FUNCTIONS: IT CAN
CC       CAUSE CYTOLYSIS OF CERTAIN TUMOR CELL LINES, IT IS IMPLICATED
CC       IN THE INDUCTION OF CACHEXIA, IT IS A POTENT PYROGEN CAUSING
CC       FEVER BY DIRECT ACTION OR BY STIMULATION OF IL-1 SECRETION, IT
CC       CAN STIMULATE CELL PROLIFERATION & INDUCE CELL DIFFERENTIATION
CC       UNDER CERTAIN CONDITIONS.
CC   -!- SUBUNIT: HOMOTRIMER.
CC   -!- SUBCELLULAR LOCATION: TYPE II MEMBRANE PROTEIN. ALSO EXISTS AS
CC       AN EXTRACELLULAR SOLUBLE FORM.
CC   -!- PTM: THE SOLUBLE FORM DERIVES FROM THE MEMBRANE FORM BY
CC       PROTEOLYTIC PROCESSING.
CC   -!- DISEASE: CACHEXIA ACCOMPANIES A VARIETY OF DISEASES, INCLUDING
CC       CANCER AND INFECTION, AND IS CHARACTERIZED BY GENERAL ILL
CC       HEALTH AND MALNUTRITION.
CC   -!- SIMILARITY: BELONGS TO THE TUMOR NECROSIS FACTOR FAMILY.
DR   EMBL; X02910; HSTNFA.
DR   EMBL; M16441; HSTNFAB.
DR   EMBL; X01394; HSTNFR.
DR   EMBL; M10988; HSTNFAA.
DR   PIR; B23784; QWHUN.
DR   PIR; A44189; A44189.
DR   PDB; 1TNF; 15-JAN-91.
DR   PDB; 2TUN; 31-JAN-94.
DR   MIM; 191160; 11TH EDITION.
DR   PROSITE; PS00251; TNF.
KW   CYTOKINE; CYTOTOXIN; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL-ANCHOR;
KW   MYRISTYLATION; 3D-STRUCTURE.
FT   PROPEP        1     76
FT   CHAIN        77    233       TUMOR NECROSIS FACTOR.
FT   TRANSMEM     36     56       SIGNAL-ANCHOR (TYPE-II PROTEIN).
FT   LIPID        19     19       MYRISTATE.
FT   LIPID        20     20       MYRISTATE.
FT   DISULFID    145    177
FT   MUTAGEN     108    108       R->W: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     112    112       L->F: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     162    162       S->F: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     167    167       V->A,D: BIOLOGICALLY INACTIVE.
FT   MUTAGEN     222    222       E->K: BIOLOGICALLY INACTIVE.
FT   CONFLICT     63     63       F -> S (IN REF. 5).
SQ   SEQUENCE   233 AA;  25644 MW;  279986 CN;
     MSTESMIRDV ELAEEALPKK TGGPQGSRRC LFLSLFSFLI VAGATTLFCL LHFGVIGPQR
     EEFPRDLSLI SPLAQAVRSS SRTPSDKPVA HVVANPQAEG QLQWLNRRAN ALLANGVELR
     DNQLVVPSEG LYLIYSQVLF KGQGCPSTHV LLTHTISRIA VSYQTKVNLL SAIKSPCQRE
     TPEGAEAKPW YEPIYLGGVF QLEKGDRLSA EINRPDYLDF AESGQVYFGI IAL
//
- ID
- This line is always the first line of an entry.  This line consists of the
entry name, data class, the word 'PRT' which means molecule type (PRoTein),
and sequence length.  The entry name is symbolized as X_Y, where X is a
mnemonic code representing the protein name, and Y is a  mnemonic species
identification code. There are two data classes available,
-  STANDARD ---- Data which are complete to the standards laid down by
                   the SWISS-PROT data bank
-  PRELIMINARY - Data for which only the sequence and bibliographic
                   information have been submitted to thorough checks
 
 
- AC
-  This line lists the accession numbers associated with an entry.  Entries
will have more than one accession number if they have been merged or split.
 
- DT
- These lines show the date of entry or last modification of the sequence
entry.
 
- DE
-  These lines contain general descriptive free-format information about
the sequence stored.
 
- GN
-  This line contains the name(s) of the gene(s) that encode for the
stored protein sequence.  In the case that more than one name has been
assigned to an individual locus, the synonyms will be listed separating
by the word `OR'.  In the case that multiple genes encode for an identical
protein, all the different gene names will be listed separating by the
word `AND'.
 
- KW
- These lines provide information which can be used to generate
cross-reference indexes of the sequence entries based on functional,
structural, or other categories.
 
- OS, OG, OC
- These fields contain information about source organism.
- OS
-  This line specifies the organism(s) which was the source of the stored
sequence.  The species designation consists of the Latin genus and species
designation followed by the English name (in parentheses).
- OG
-  This line indicates if the gene coding for a protein originates from
the organelle such as mitochondria, the chloroplast, a cyanelle, or a plasmid.
- OC
-  These lines contain taxonomic classification of the source organism.
The classification is listed top-down as nodes in a taxonomic tree in which
the most general grouping is given first.
 
 
- RN, RP, RC, RM, RA, RL
- These fields comprise the literature citations within SWISS-PROT.  The
citations indicate the papers from which the data has been abstracted.
- RN
- This line gives a sequential number to each reference citation in an entry.
- RP
-  This line describes the extent of the work carried out by the authors
of the reference cited.
- RC
-  This lines are are used to store comments relevant to the reference cited.
The format is 'TOKEN1=TEXT; TOKEN2=TEXT; ... ', where the currently
defined tokens are PLASMID, SPECIES, STRAIN, TISSUE, and TRANSPOSON.  The
`SPECIES' token is only used when an entry describes a sequence which is
identical in more than one species; similarly the `PLASMID' is only used if
an entry describes a sequence identical in more than one plasmid.
- RM
-  This line indicates the Medline Unique ID of a reference.
- RA
-  These lines list the authors of the paper (or other work) cited.  All of
the authors are included, and are listed in the order given in the paper.
- RL
-  These lines contain the conventional citation information for the
reference like below.
- Journal citations
- Book citations
- Unpublished results
- Unpublished observations
- Thesis
- Patent applications
- Submissions
 When a reference is made to a paper which is `in press' at the time when the
data bank is released, the page range, and eventually the volume number are
indicated as '0' (zero).
 
 
- DR
-  These lines are used as pointers to information related to SWISS-PROT
entries and found in data collections other than SWISS-PROT.  Each line has
a database identifier and entry IDs in it.
 
- FT
-  This table describes regions or sites of interest in the sequence.  In
general the feature table lists post-translational modifications, binding
sites, enzyme active sites, local secondary structure or other characteristics
reported in the cited references.  Sequence conflicts between references are
also included in the feature table.  Each line of this table consists of
feature key, `FROM' and `TO' endpoint
specifications, and optionally a description which contains additional
information about the feature.
- If the `FROM' and `TO' specifications are equal, the feature indicated
consists of the single amino acid at that position.
- When a feature is known to extend beyond the end(s) of the sequenced
region, the endpoint specification will be preceded by < for features which
continue to the left end (N-terminal direction) or by > for features which
continue to the right end (C-terminal direction).
- Unknown endpoints are denoted by `?'.
 
 
- SQ
-  This line marks the beginning of the sequence data and gives a quick
summary of its content, which are sequence length (AA), molecular weight (MW),
and checking number (CN).  For checking number, please refer to:
 Bairoch A., Biochem. J. 203:527-528(1982).
 
- CC
-  These lines are free text comments on the entry, and may be used to convey
any useful information.  A major proportion of the comment blocks, each of
which is started with a mark `-!-', are arranged according to what we
designate  as 'topics`.  See also topics table.