Homepage

Belozersky Institute

GeneBee

Russian EMBnet Node

HELP ON SCREENING OF PATTERN OR ALIGNMENT AGAINST PROTEIN PROTEIN

REFERENCES:


ALGORITHM

The method of looking for all pattern entries (possibly inexact) in PROTEIN databank is almost the same as in PROSITE screening procedure. The only difference is that coincidence of pattern's and fragment's letter could be seen in a broad sense: as a similarity of letters according to a weight matrix, selected by user.

The algorithm of screening is described in Biochemistry (Moscow), 1995 (Brodsky L. et al., v.60, ¹8, pp. 1221 - 1229). It based on idea that existence in the selected sequence fragment the significantly large number of letter's pairs with the same distance between pair members as in a pattern. The number have to be large in comparison with corresponding numbers in random sequence. By the method a rather weak correspondence between a fragment and the pattern could be selected, including selection of fragments with lengths that differ from pattern length.

The screening by query alignment, based on the same principle: the alignment is seen as a pattern defining letter's frequencies in every position.

The pattern example:

[RK]-x-A-x(3,5)-{YWL}-x(2)-W

Here:

  • simbol "x" means that it could be any letter at given position;
  • a figure after "x" shows the number positions with arbitrary residues (two figures mean interval of arbitrary positions);
  • the only one capital letter at a position means that only given residue should be at the position. Several capital letters in square brackets means that any of set's residues could be at given position;
  • presence of one or several capital letters in braces means that any residue excluding the set in braces could be at given position.

PARAMETERS

In the dialog window you should set the following parameters of the program: