Example in FASTA format:
>FOSB_HUMAN P53539 homo sapiens (human). fosb protein MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTA ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPVVDPYDMPGTSYSTPGMSGYSSGGASGS GGPSTSGTTSGPGPARPARARPRRPREETLTPEEEEKRRVRRERNKLAAAKCRNRRRELT DRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGCKIPYEEGPGPGPLAEVRD LPGSAPAKEDGFSWLLPPPPPPPLPFQTSQDAPPNLTASLFTHSEVQVLGDPFPVVNPSY TSSFVLTCPEVSAFAGAQRTSGSDQPSDPLNSPSLLAL
DNA vs. PROTEIN: The program will count the number of A,C,G,T,U and N characters. If 80% or more of the characters in a sequence are as above, then DNA / RNA is assumed, protein otherwise.
Graphical alignment - graphical inmage of the found sequences with the most frequent KW, DE, FT keywords; FT fragments & DR keywords - graphical image of the found sequences with FT fragments and DR keywords; Cross-reference map - cross-reference map of the found sequences; Alignments description - brief description of the found sequences and the first 10 supermotifs; Statistics - statictics for the most frequent keywords.
So, "On" is recommended.
Strand
This option sets which frames will be processed. If 'Only forward' is choosen then three forward frames will be processed in the case protein against nucleotide databanks and single forward strand will be processed in the case nucleotide against nucleotide databanks. If 'Both' is choosen all six frames will be processed in the case protein against nucleotide databanks and both forward and backward strands will be processed in the case nucleotide against nucleotide databanks.
Clusterization type
This option has sense only in the case nucleotide against protein databanks OR protein against nucleotide databanks. If 'Each frame separately' is chosen then found motifs will be clustered (into supermotifs) separatly for each frame. If 'Codirectional joinly' is choosen then motifs found on codirectional frames (forward and backward) will be clustered joinly. So it is possible obtain supermotif containing motifs from 2 or 3 forward (or backward) frames. The reason of this option - propable errors in query or databanks seuences.
Weight Matrices:
There are 3 matrices inplemented in GeneBee. You may choose any of them - Dayhoff, Blosum62,
or Johnson - at the prompt in the full query page. The default matrix is Dayhoff.
Dayhoff Matrix
(modified 250 PAM matrix from Atlas of Protein sequence and structure,v.5, suppl. 3, pp.345-358):
A C D E F G H I K L M N P Q R S T V W Y
A 12
C 8 22
D 10 5 14
E 10 5 13 14
F 6 6 4 5 19
G 11 7 11 10 5 15
H 9 7 11 11 8 8 16
I 9 8 8 8 11 7 8 15
K 9 5 10 10 5 8 10 8 15
L 8 4 6 7 12 6 8 12 7 16
M 9 5 7 8 10 7 8 12 10 14 16
N 10 6 12 11 6 10 12 8 11 7 8 12
P 11 7 9 9 5 9 10 8 9 7 8 9 16
Q 10 5 12 12 5 9 13 8 11 8 9 11 10 14
R 8 6 9 9 6 7 12 8 13 7 10 10 10 11 16
S 11 10 10 10 7 11 9 9 10 7 8 11 11 9 10 12
T 11 8 10 10 7 10 9 10 10 8 9 10 10 9 9 11 13
V 10 8 8 8 9 9 8 14 8 12 12 8 9 8 8 9 10 14
W 4 2 3 3 10 3 7 5 7 8 6 6 4 5 12 8 5 4 27
Y 7 10 6 6 17 5 10 9 6 9 8 8 5 6 6 7 7 8 10 20
Blosum62 Matrix
Unique
Identifier: 93066354 (MEDLINE)
Authors: Henikoff S. Henikoff
J. G.
Institution: Howard Hughes Medical Institute, Fred
Hutchinson Cancer Research Center, Seattle, WA 98104.
Title:
Amino acid substitution matrices from protein blocks.
Source:
Proceedings of the National Academy of Sciences of the United States of
America. 89(22):10915-9, 1992 Nov 15.
Abstract:
Methods for
alignment of protein sequences typically measure similarity by using a
substitution matrix with scores for all possible exchanges of one amino
acid with another. The most widely used matrices are based on the Dayhoff
model of evolutionary rates. Using a different approach, we have derived
substitution matrices from about 2000 blocks of aligned sequence segments
characterizing more than 500 groups of related proteins. This led to
marked improvements in alignments and in searches using queries from each
of the groups.
A C D E F G H I K L M N P Q R S T V W Y A 8 C 4 13 D 2 1 10 E 3 0 6 9 F 2 2 1 1 10 G 4 1 3 2 1 10 H 2 1 3 4 3 2 12 I 3 3 1 1 4 0 1 8 K 3 1 3 5 1 2 3 1 9 L 3 3 0 1 4 0 1 6 2 8 M 3 3 1 2 4 1 2 5 3 6 9 N 2 1 5 4 1 4 5 1 4 1 2 10 P 3 1 3 3 0 2 2 1 3 1 2 2 11 Q 3 1 4 6 1 2 4 1 5 2 4 4 3 9 R 3 1 2 4 1 2 4 1 6 2 3 4 2 5 9 S 5 3 4 4 2 4 3 2 4 2 3 5 3 4 3 8 T 4 3 3 3 2 2 2 3 3 3 3 4 3 3 3 5 9 V 4 3 1 2 3 1 1 7 2 5 5 1 2 2 1 2 4 8 W 1 2 0 1 5 2 2 1 1 2 3 0 0 2 1 1 2 1 15 Y 2 2 1 2 7 1 6 3 2 3 3 2 1 3 2 2 2 3 6 11
A C D E F G H I K L M N P Q R S T V W Y A 16 C 6 26 D 8 0 18 E 9 3 12 18 F 7 5 3 3 20 G 9 2 8 7 1 18 H 7 2 9 7 8 7 22 I 8 2 5 5 10 4 5 18 K 9 1 8 11 4 6 10 5 17 L 6 1 2 4 12 3 5 12 6 17 M 8 5 4 7 9 5 7 12 8 14 21 N 8 2 12 9 6 8 11 5 10 5 6 18 P 9 1 9 8 5 7 5 4 9 7 0 7 20 Q 9 3 9 12 3 7 11 3 11 5 9 9 6 19 R 8 4 6 10 4 7 10 4 13 6 6 8 6 12 20 S 10 2 10 8 5 8 7 5 8 5 5 11 9 9 9 16 T 9 4 8 9 5 6 7 7 10 5 7 10 8 9 8 12 17 V 9 5 5 6 8 4 6 14 6 12 10 4 5 6 5 5 8 17 W 4 1 4 2 13 3 6 6 4 9 9 4 2 2 6 4 0 5 25 Y 6 2 6 6 13 4 9 7 6 7 8 7 3 5 8 6 7 8 12 20
A C D E F G H I K L M N P Q R S T V W Y A 10 C 0 10 D 0 0 10 E 0 0 0 10 F 0 0 0 0 10 G 0 0 0 0 0 10 H 0 0 0 0 0 0 10 I 0 0 0 0 0 0 0 10 K 0 0 0 0 0 0 0 0 10 L 0 0 0 0 0 0 0 0 0 10 M 0 0 0 0 0 0 0 0 0 0 10 N 0 0 0 0 0 0 0 0 0 0 0 10 P 0 0 0 0 0 0 0 0 0 0 0 0 10 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 10 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10