MULTIPLE ALIGNMENT
(algorithm)
Multiple Sequence Alignment is the arrangement of several protein or nucleic
acid sequences with postulated gaps so that similar residues (in one-letter code) are
juxtaposed.
GeneBee algorithm of MULTIPLE ALIGNMENT proceeds in four steps:
- construction of pairwise MOTIFS
- construction of MULTIPLE MOTIFS (of thickness exceeding 2)
- forming of SUPERMOTIFS from MULTIPLE MOTIFS
- construction of MULTIPLE ALIGNMENTS from previously obtained MOTIFS and SUPERMOTIFS and
consequent selection of the best alignment.
CONTENTS:
- MULTIPLE ALIGNMENT.
- MOTIFS.
- POWER.
- TOTAL and LOCAL alignments.
- SUPERMOTIFS.
- ALGORITHM.
1. POWER.
1.1. Corrected power.
1.2. Alignment power.
2. CONSTRUCTION OF PAIRWISE MOTIFS.
3. CONSTRUCTION OF MULTIPLE MOTIFS.
4. CONSTRUCTION OF ALIGNMENTS.
4.1. Construction of supermotifs.
4.2. Construction of total DRAFT alignments.
4.2.1. Greedy procedure.
4.2.2. Dynamic programming (Needleman - Wunch).
4.2.3. Direct method.
4.3. Construction of total REFIGNED alignments.
5. MATRICES
GENERAL DEFINITIONS
1.1. MULTIPLE ALIGNMENT.
The multiple alignment of several biopolymer sequences is the arrangement of
several protein or nucleic acid sequences with postulated gaps so that similar residues
(in one-letter code) are juxtaposed. In this program the alignment is first constructed of
the so-called MOTIFS (see below) and later is improved by a dynamic programming procedure.
1.2. MOTIFS.
In some fragments of the set of aligned sequences there exists a similarity between
all or some layers (sequence fragments) of the alignment. These regions of similarity
without gaps are called MOTIFS. MOTIFS are characterized by length (number of letters in
each fragment) and thickness (number of layers in a MOTIF).
1.3. POWER.
The degree of similarity between MOTIF layers is determined in this program by
probabilistic formulas. To estimate it we introduce the notion of the POWER of a MOTIF.
This notion is fundamental for this program. POWER relates to the possibility of receiving
a sum of mismatch weights along an alignment for random sequences of the same length.
Analogous reasoning allows to introduce a notion of the POWER of an ALIGNMENT. In various
procedures of the algorithm either POWER or the traditional sum of mismatch weights serve
as the actual value of an ALIGNMENT.
1.4. TOTAL and LOCAL
alignments.
In the fist step of our algorithm for each pair of sequences all similar fragments are
found by the DotHelix procedure. They form motifs with thickness 2.
Similarities between sequences and motifs of thickness 2 form motifs of thickness 3.
Further, motifs of thickness 4 are obtained from motifs of thickness 2 and 3 by linking
them with suitable sequence fragments or motifs. So the full set of the best motifs could
be constructed.
The draft result of an alignment can be represented as a chain of consistent (with regards
to the order of letters in sequences) motifs, interrupted by the necessary number of gaps.
If an alignment cannot be extended by addition of any new motifs at either end, it is
called TOTAL. If an alignment can be extended at least at one end, it is called LOCAL
alignment.
1.5. SUPERMOTIFS.
Often close neighboring motifs would extend one another if some gaps are introduced. Such
region (called SUPERMOTIF) can have greater similarity between fragments forming it, than
the individual motifs. Supermotifs and motifs are instances of LOCAL alignments. Note,
that in course of search for functionally important regions, the knowledge of motifs and
supermotifs is of greater importance for a biologist than an alignment itself. Thus the
procedure for motif and supermotif determination has an independent value.
1.6. ALGORITHM.
The algorithm proceeds in five steps:
- construction of pairwise MOTIFS;
- construction of multiple MOTIFS (with thickness exceeding 2);
- forming of SUPERMOTIFS from these MOTIFS;
- construction of MULTIPLE ALIGNMENTS from previously obtained MOTIFS
and SUPERMOTIFS and selection of the best alignment.
- refinement of the DRAFT alignment by iterative dynamic programming
procedure
The 4th step can be performed in 2 ways.
- 1) The method is an automated input of MOTIFS and SUPERMOTIFS ordered
by their quality - GREEDY ALGORITHM.
- 2) The method is a variant of dynamic programming, that employs a
similarity measure of pairwise MOTIFS. In this method the quality of a TOTAL alignment is
determined by the sum of mismatch weights accounting for gap penalties.
APPENDIX
1. POWER
As it has been said above, we define the quality of various local similarities
based on probabilistic considerations. The most fundamental is the notion of MOTIF POWER.
It is defined by the following formula:
L
S S ( W{b(r1,l), b(r2,l)} - M )
l 1<r1<r2<K
A = ------------------------------------------- (1)
sqrt (DL)
where
S is sum,
A - POWER of the considered MOTIF,
L - length of a MOTIF, i.e. the common length of fragments forming it,
K - thickness of a MOTIF, that is the number of layers in it,
b(r,l) are the letters (residues) forming the r-th layer,
W{A,C} is the weight of substitution of a letter A to letter C, characterizing the
similarity of corresponding residues,
M and D are user-defined constants.
Increase of power corresponds to increase of similarity between
motif layers.
For random independent Bernoulli sequences M should be mean substitution weight {M(W)}, D
- the variance of sum of weights situated in one column of a motif. Then the power A can
be considered as a random variable which mean equals 0 and variance equals 1. When this
choice of the constants M and D is made, the power A is the normalized sum over all motif
positions of weights for pairwise substitutions of letters situated in one column. The
probabilistic meaning of A is following:
ln{P(A>A0} is equivalent to -(A0 /2)**2,
(we use sign "**" for "raising to power").
Biological sequences, especially similar ones, cannot be assumed to be independent. Thus
in some cases it is useful to set M to values larger than mean of substitution weights. It
allows not to mark as similar substantially non-similar fragments.
The weights W{A,C} form a weight matrix defined by user. Its choice can vary. In
the case of protein sequences one of the most widely used is the Dayhoff matrix in which
weights are set based on substitution frequencies in already known homologous proteins. In
other cases weights correspond to physico-chemical properties of amino acids. Simultaneous use of several matrices in course of the alignment
construction is also possible. In this case the algorithm links together independently
obtained motifs.
The notion of POWER as defined by formula (1) is a base for other
definitions: CORRECTED MOTIF POWER, CONDITIONAL POWER, ALIGNMENT POWER. The latter values
are the ones directly used in this program.
1.1. Corrected power
The power A characterizes the probability to obtain the given sum of weights for an
individual motif taken from Bernoulli sequences. However, as the thickness K increases,
the number of possible motifs becomes larger for a fixed length. Thus for Bernoulli
sequences it is simpler to encounter powerful motifs among thick ones than among thin
ones. So thick motifs would prevail in the list of the most powerful motifs and their
importance would be overstated. In order to avoid that, we perform a correction
substituting the power A for the CORRECTED POWER (A_cor).
The formula for A_cor is based on the following informal reasoning.
Let's assume for simplicity that all sequences have similar length n. The number of
possible relative shifts of sequences, when motifs of thickness k are considered has the
order n**(k-1), and it is approximately n**(k-1) times greater than that for motifs of
thickness 2, for which no correction is performed. Thus the correction should decrease the
corresponding probability P(A) at n**(k-2) times. In order to do that it is necessary to
decrease half of the squared power by (k-2)*ln(n). Thus we get the following formula for
corrected power:
A_cor = sqrt{A**2 - 2*(k-2)*ln(n)}.
1.2. Alignment power
We characterize similarity of sequence fragments participating in an alignment (LOCAL or
TOTAL) by its POWER. It is defined analogously to the MOTIF POWER A (formula 1). When
weights are summed along the alignment path, the weight of a pair of corresponding
residues b(l,r1) and b(l,r2) is taken from the matrix (as in (1)) if this pair occurs in a
motif, and equals the mean weight M(W) otherwise. If one of the symbols b(l,r1) and
b(l,r2) is a gap, then the weight equals the user-defined gap penalty, which in any case
does not exceed M(W). In the formula for the alignment power L is set to the alignment
length.
One of the reasons for introducing the gap penalty is the correction similar to one in the corrected motif
power A_cor. The problem is that the number of possible alignments sharply increases as
the allowed number of gaps increases. Thus for random independent Bernoulli sequences the
maximum of the sum of weights along the alignment path increases, and so the power of
alignments with many gaps is too large. In order to compensate for this effect, the gap
penalty is introduced. Unfortunately, unlike the situation with the corrected motif power,
it is difficult to estimate theoretically the gap penalty, since sums of weights along
different path are strongly dependent.
2. CONSTRUCTION OF PAIRWISE
MOTIFS
As it was stated, the first step of our procedure is the construction of pairwise motifs.
This is performed by DOTHELIX procedure (Leontovich et al, 1993).
Thresholds for power
and lengths of motifs are set by user. Sets of motifs can be
constructed employing different matrices.
When the standard constant M=M(W) is chosen, a
large number of the obtained motifs will be "noise", despite their formal
statistical significance. This fact, obstacling further alignment construction by increase
of the search, is caused by the null hypothesis of independence of the sequence being
aligned (indeed, the very desire to align sequences is an evidence of their dependence!).
Clearly, for dependent (similar) sequences the mean mismatch weight increases as compared
to formula (2). The program allows a user to account for this fact by increasing M (by
setting parameter "Min. homology ratio" in the
interval from 0 to 1: 0.01 is recommended). Experiments demonstrate that this procedure
allows to filter out a majority of noise motifs.
Thus it is possible to find motifs for various M and different
weight matrices. A user can lump together all obtained motifs, and it increases
possibilities of alignment.
The described procedure is fairly slow, since it has to process a
very large number of relative sequence shifts. To accelerate this process, it's possible
to consider only the most "promising" shifts, which are processed further. Of
course, this pre-processing can lead to the loss of some motifs. Such sorting of registers
is performed by a Lipman-Pearson type procedure (Lipman and Pearson, 1985).
Consider it in more detail. First, the letters
(residues) are divided into several classes ("colors").
This coloring is consistent with the weight matrix: substitution weights within a class
should be sufficiently large. For each shift of sequences the following procedure is
performed: for each position of a window (of length, say, 7)
the value t(t-1)/2 is calculated, where 't' is the number of color matches within this
window.
Second, the sum of these values for all window
positions for a given shift is calculated. To select a given shift the ratio of this sum
to maxsum should be more than user-defined value (parameter "Coincidence
ratio").
The described characteristic is more preferable as opposed to the
more widely used number of perfect matches or number of color matches (that is, the sum of
t), since it is sensitive to determination of shifts setting in correspondence locally
similar regions. On the other hand, our procedure is sufficiently fast, since the above
characteristic can be calculated using pre-formed looking tables of all color pairs (the
distance between which does not exceed the window length) for all sequences.
A user-defined number of
shifts is a subject for further processing.
3. CONSTRUCTION OF MULTIPLE
MOTIFS.
Generally speaking, in order to find all possible motifs of different lengths, all
comparison registers should be analyzed, and for each of them the set of non-intersecting
sufficiently powerful motifs should be found, as it is done in DotHelix procedure.
However, this procedure requires a prohibitively large time even for a quite small number
of sequences k, since the number of registers is proportional to n**(k-1), where n is the
average sequence length.
In order to avoid the full register search, we employ a procedure for motif determination.
This procedure is based on the fact that in a sufficiently powerful motif all, or almost
all pairs of layers are significally similar, so a multiple motif should be the union of
pair motifs.
Consider this algorithm in more details. It is sufficient to
describe the procedure of linking of two motifs (or a motif and a sequence fragment). If
the motifs have no common layers, the DotHelix procedure is applied (each motif is
considered as a unit). Linking of a fragment to a motif is performed in the same way. If
motifs have common layers, then (if it's possible) the natural correspondence between
layers is established, and then DotHelix is applied to thus defined comparison register.
Three options of the parameter "How to process motifs" relate to three nested possible ways of
motif construction:
-- to create all pair motifs ("Sequence-sequence");
-- to extend pair motif by one additional layer ("Motif-sequence");
-- to link all possible pairs of existed motifs ("Motif-motif");
In all three cases, prior to processing by DotHelix, motifs are slightly extended in order
to allow a more precise identification of motif boundaries.
The result of this procedure is a set of motifs of various lengths.
Each motif is characterized by the corrected power (A_cor), calculated when the motif was
constructed. Motifs with the power exceeding certain thresholds are retained.
Incomplete thickness of a motif means that only some sequences
possess a good similarity in this region.
Intermediate storage of motifs is based on the
stack principle, so that only some best (with regards to the corrected power) motifs are
retained. When a new motif is constructed, the stack is re-ordered and the worst of the
stored motifs are deleted. It allows to save memory and calculation time.
Speaking metaphorically, the suggested method first selects areas of
potential similarity, based on pairwise comparisons. Then some of these areas become more
"visible" due to their similarity with fragments of other sequences, while the
others are not supported by additional similarities and disappear from the stack.
The accelerating procedure described in Section 2
can be applied to the construction of multiple motifs as well. It can be further
accelerated by considering color matches only in conserved positions of motif (i.e.
positions colored almost entirely in one color). The percentage of one-colored letters in
a conservative position is a user-defined parameter "Min.
letter frequence".
4. CONSTRUCTION OF
ALIGNMENTS (supermotifs and total alignments).
As it was said, each DRAFT alignment (partial or total) is a consistent chain of motifs.
Each aligned sequence consists of fragments similar to fragments of other sequences (and
thus belonging to motifs), fragments not belonging to any motif, and gaps. Since it is
hardly possible that biological sequences can possess "negative" similarity (one
with a negative power), i.e. improbable in a random sequence dissimilarity, it is
reasonable to assume that the order of letters and gaps in motif-free fragments is
unimportant. In this program in such regions first letters are written, and then the
necessary number of gaps is added. The only exclusion from this rule is the beginning of
the sequence, which is indented to the right.
4.1. Construction of
supermotifs
Supermotifs are formed at the first step of construction of the total alignment . Supermotif is defined as a partial alignment, which power exceeds
powers of all its subalignments. In particular, its power exceeds powers of all motifs
belonging to it. In the program two methods for linking of motifs into supermotifs are
implemented: greedy and cluster.
In the greedy method first the most powerful motif is considered,
then it is linked with such motif, that the power of the constructed supermotif exceeds
powers of both motifs belonging to it. Then the obtained supermotif is linked to the third
motif (if it is possible) etc. This procedure is performed while the supermotif power
increases.
In the cluster method the following scheme is employed. Initially
each motif is considered as a supermotif. At each step two such supermotifs are linked,
that the power of the obtained supermotif exceeds powers of its constituents, and it is
the maximum one among all possible linkages.
It should be noted, that not all supermotifs can be constructed by
these procedures. Construction of algorithms that would allow us to construct all existing
supermotifs is a subject of further research.
4.2. Construction of total
DRAFT alignments.
Consider now construction of a total alignment. As it has been mentioned in Section 1, two
methods are currently implemented in the program: greedy and dynamic programming.
4.2.1. Greedy procedure
Construction of total alignments by the greedy procedure
is similar to the greedy construction of supermotifs, but it is not required that the
power of obtained partial alignments necessarily increases. The procedure can be performed
in several steps. First a total alignment of the most powerful motifs and supermotifs is
constructed. Then it is supplemented by a new set of motifs and the alignment is modified
without disturbing already aligned fragments. These new motifs can be obtained by either
softening requirements to the power, or with the use of some other matrix, or allowing a
lower homology threshold, or by combining some of the above-mentioned parameters. Thus we
obtain the second alignment, add to it more motifs, etc. Usually at the first step the
most popular - Dayhoff - - matrix is used.
This algorithm can be generalized in the following way (the derived
procedure would not be a greedy one). First, one of t best (super)motifs is taken, where t
is defined by a user. Then each obtained partial alignment (currently consisting of a
single supermotif) is linked by a new supermotif. This is done by one of $t$ ways so that
the power of the alignment, if possible, increases. The procedure continues until there
are supermotifs that can be still linked to an alignment. If t=1, this is the above
described greedy procedure, If t= number of motifs in the stack, this procedure reduces to
the full search. As t increases, the calculation time also increases, but the quality of
alignment improves.
4.2.2. Dynamic programming
(Needleman - Wunch).
A dynamic programming procedure has been implemented for
construction of a total alignment. First, we describe this method in the case of two
sequences.
We represent an alignment as a chain of motifs. These motifs can
belong to an alignment not only completely, but partially as well. It is taken into
account in our program, but below we for simplicity assume that motifs enter an alignment
entirely. Alignment quality is measured by its "price", that is defined as a sum
of centered weights W(al,bl) of mismatches along the alignment path minus gap penalties.
For each motif it is possible to point out motifs that can precede or follow it in an
alignment. Thus the set of motifs is a partially ordered set. If a partial alignment has
been constructed, further aligning depends only on the most recently aligned fragment.
Thus the dynamic programming principles, and in particular, a Needleman-Wunsch type
procedure can be employed. For each motif we find the maximum value of alignments starting
at the beginning of the sequences and ending at this motif. We do not consider all
transitions from a current motif to subsequent ones: among motifs having close registers
only transition to the nearest one is allowed.
If many sequences are aligned, a complete alignment can also be
represented as a chain of pairwise motifs (since a multiple motif can be decomposed into
pairwise ones). However, unlike the previous case, alignment depends not only on the last
aligned motif, but on some previous motifs. Thus in this case the dynamic programming
principle cannot be applied.
In this connection we introduce the following notion. Each partial
alignment is set in correspondence with its FRONT, defined as the set of the rightmost in
each sequence pairwise motifs belonging to the alignment, and their relative positions in
the alignment. Thus in order to define a front, it is necessary, first, to set for each
sequence such motif, that one of its layers belongs to this sequence, and, second, to
define the position of this motif in the alignment relative to the initial position of the
leftmost motif in this front. The initial position of any frontal motif in a sequence, to
which it is ascribed, should be situated rightwards relative to beginnings in this
sequence of all other frontal motifs. One motif can be ascribed to two sequences (recall
that all motifs are pairwise ones).
It is simple to demonstrate that further construction depends only
on the front of the already aligned part. Thus a dynamic programming procedure is
applicable.
It is clear, that as the number of motifs increases, the number of
fronts also increases - at a very fast rate (roughly as [N/t]**t) , where 'N' - number of
motifs, 't' - - number of sequences). Thus it seems to be necessary to bound the number of
fronts by a reasonable stack size, so that alignment of natural sequences would be
possible.
4.2.3. Direct method.
4.3. Construction of the
REFINED alignment.
The REFINED alignment is constructed by the iterative improvement of
the DRAFT alignment. Each layer (sequence) of the DRAFT alignment is aligned against all
other alignment's layers taken as a single batch by the dynamic programming procedure. The
process of iteration is completed when the quality of alignment doesn't become any better.
The basic multiple motifs were inserted while constructing the DRAFT
alignment. These basic motifs contain consensus (conservative) sub-columns. This is
exactly why the DRAFT alignment is a good starting point for the iterative process of the
improvement of alignments by the consequtive procedures of dynamic programming.
The euristic reasons are following. Let's call the consensus
sub-columns "embryos". As a rule, the largest faction of "embryos"
does correspond to the correct alignment. However, some of the "embryos" can be
incorrect. The task of the iterative improvement is to expand these correct
"embryos"/sub-columns up to the thickness of a complete alignment. During this
process incorrect "embryos" would disappear, because they won't find good
corresondence in the other lines, trying to expand. The residues, participating in these
incorrect "embryos", would be forced to move to the other columns, so that they
could provide the strengthening of the correct "embryos".
5. MATRICES.
There are 3 matrices inplemented in GeneBee. You may
choose any of them - Dayhoff, Blosum62,
or Johnson - at the prompt in the full query page. The default matrix is Dayhoff.
DAYHOFF MATRIX
(modified 250 PAM matrix from Atlas of Protein sequence and structure,v.5, suppl. 3,
pp.345-358):
A C D E F G
H I K L M N P Q R S T
V W Y
A 12
C 8 22
D 10 5 14
E 10 5 13 14
F 6 6 4 5 19
G 11 7 11 10 5 15
H 9 7 11 11 8 8 16
I 9 8 8 8 11 7 8 15
K 9 5 10 10 5 8 10 8 15
L 8 4 6 7 12 6 8 12 7 16
M 9 5 7 8 10 7 8 12 10 14 16
N 10 6 12 11 6 10 12 8 11 7 8 12
P 11 7 9 9 5 9 10 8 9 7 8
9 16
Q 10 5 12 12 5 9 13 8 11 8 9 11 10 14
R 8 6 9 9 6 7 12 8 13 7 10 10 10 11
16
S 11 10 10 10 7 11 9 9 10 7 8 11 11 9 10 12
T 11 8 10 10 7 10 9 10 10 8 9 10 10 9 9 11
13
V 10 8 8 8 9 9 8 14 8 12 12 8
9 8 8 9 10 14
W 4 2 3 3 10 3 7 5 7 8 6
6 4 5 12 8 5 4 27
Y 7 10 6 6 17 5 10 9 6 9 8 8
5 6 6 7 7 8 10 20
BLOSUM62 MATRIX
Unique Identifier: 93066354 (MEDLINE)
Authors: Henikoff S. Henikoff J. G.
Institution: Howard Hughes Medical Institute, Fred Hutchinson Cancer Research
Center, Seattle, WA 98104.
Title: Amino acid substitution matrices from protein blocks.
Source: Proceedings of the National Academy of Sciences of the United States of
America. 89(22):10915-9, 1992 Nov 15.
Abstract:
Methods for alignment of protein sequences typically measure similarity by using a
substitution matrix with scores for all possible exchanges of one amino acid with another.
The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using
a different approach, we have derived substitution matrices from about 2000 blocks of
aligned sequence segments characterizing more than 500 groups of related proteins. This
led to marked improvements in alignments and in searches using queries from each of the
groups.
A C D E F G H I K
L M N P Q R S T V W Y
A 8
C 4 13
D 2 1 10
E 3 0 6 9
F 2 2 1 1 10
G 4 1 3 2 1 10
H 2 1 3 4 3 2 12
I 3 3 1 1 4 0 1 8
K 3 1 3 5 1 2 3 1 9
L 3 3 0 1 4 0 1 6 2 8
M 3 3 1 2 4 1 2 5 3 6
9
N 2 1 5 4 1 4 5 1 4 1
2 10
P 3 1 3 3 0 2 2 1 3 1
2 2 11
Q 3 1 4 6 1 2 4 1 5 2
4 4 3 9
R 3 1 2 4 1 2 4 1 6
2 3 4 2 5 9
S 5 3 4 4 2 4 3 2 4 2
3 5 3 4 3 8
T 4 3 3 3 2 2 2 3 3 3
3 4 3 3 3 5 9
V 4 3 1 2 3 1 1 7 2 5
5 1 2 2 1 2 4 8
W 1 2 0 1 5 2 2 1 1 2
3 0 0 2 1 1 2 1 15
Y 2 2 1 2 7 1 6 3 2 3
3 2 1 3 2 2 2 3 6 11
JOHNSON MATRIX
Unique Identifier: 94016587 (MEDLINE)
Authors: Johnson M. S. Overington J. P.
Institution: Department of Crystallography, Birkbeck College, University of London,
U.K.
Title: A structural basis for sequence comparisons. An evaluation of scoring
methodologies. Source: Journal of Molecular Biology. 233(4):716-38, 1993 Oct 20.
Abstract:
A residue-exchange matrix has been derived that is suitable for comparison of amino acid
sequences. This matrix is based on the tabulation of 207,795 amino acid replacements
observed in 65 homologous sets of structurally aligned three-dimensional structures (235
proteins). The majority of the data is from structural comparisons where there is between
15 and 40% sequence identity. As a result, a scoring matrix such as the one devised here
should provide a sensitive basis for the comparison of amino acid sequences and the search
for homologous sequences in amino acid databases. In order to assess the value of this
matrix we have made a comparative analysis with 12 other published scoring matrices that
have been used for the alignment of protein amino acid sequences. We find that the matrix
derived here is among the better performers in terms of alignment significance, detection
of homologous sequences and the accuracy of alignments.
A C D E F G
H I K L M N P Q R S T
V W Y
A 16
C 6 26
D 8 0 18
E 9 3 12 18
F 7 5 3 3 20
G 9 2 8 7 1 18
H 7 2 9 7 8 7 22
I 8 2 5 5 10 4 5 18
K 9 1 8 11 4 6 10 5 17
L 6 1 2 4 12 3 5 12 6 17
M 8 5 4 7 9 5 7 12 8 14 21
N 8 2 12 9 6 8 11 5 10 5 6 18
P 9 1 9 8 5 7 5 4 9
7 0 7 20
Q 9 3 9 12 3 7 11 3 11 5 9
9 6 19
R 8 4 6 10 4 7 10 4 13 6 6 8
6 12 20
S 10 2 10 8 5 8 7 5 8 5 5 11
9 9 9 16
T 9 4 8 9 5 6 7 7 10 5 7
10 8 9 8 12 17
V 9 5 5 6 8 4 6 14 6 12 10 4
5 6 5 5 8 17
W 4 1 4 2 13 3 6 6 4 9
9 4 2 2 6 4 0 5 25
Y 6 2 6 6 13 4 9 7 6 7 8
7 3 5 8 6 7 8 12 20
Unitary matrix for DNA/RNA:
A C G T U
A 10
C 0 10
G 0 0 10
T 0 0 0 10
U 0 0 0 10 10
Go to SIMPLE
QUERY page
Go to FULL
QUERY page
Go to OUTPUT NOTES
page |