RNA secondary structure prediction

REFERENCES:

Brodsky L.I., Vasiliev A.V., Kalaidzidis Ya.L., Osipov Yu.S., Tatuzov R.L., Feranchuk S.I. GeneBee: the program package for biopolymer structure analysis, 1992, Dimacs, 8, 127-139. [Word97 doc]
Brodsky L.I., Ivanov V.V., Kalai dzidis Ya.L., Leontovich A.M., Nikolaev V.K., Feranchuk S.I., Drachev V.A. GeneBee-NET:Internet-based server for analyzing biopolymers structure, 1995, Biochemistry, 60, 8, 923-928. [Word97 doc ]

ALGORITHM

If there's no significant multiple alignment, including the sequence, that you interest, then the secondary structure is built, using only the energy model (just the potential energy of the system is minimized), but very often such method will give unreliable results.

If a multiple alignment is given, then information on conservative positions in it and compensation exchanges in some of those will be used - stems, including such positions, are given more chances to be included into the resulting secondary structure.

When working with a single sequence, as well as with an alignment, the dialog window for setting parameters of the program looks the same. The following parameters are set:

The result can be sent to you by E-mail along with it's WWW demonstration.

the Method of finding optimal structure: dynamic programming (NW) or greedy (Greedy is suggested);

Energy threshold for helices, used for construction (suggested value: - 4 Kkal/mol);

"premium" Coefficient of increasing free energy for conservativeness of a pair of complementary position (2 Kkal/mol) - (Conserve factor);

"premium" Coefficient of increasing free energy for a pair of complementary positions, having compensatory changes in the alignment (4 Kkal/mol) - (Compensate factor);

"demanded" Coefficient of increasing free energy when grouping a cluster of two local secondary structures (2 Kkal/mol) - (Cluster factor);

"demanded" Percent of correct pairs along the alignment, in order to consider any two given positions to be "conservative" or "compensative" complementary (0.8) - (Conservativity);

Number of variants, tried out for inclusion into the structure on every step of the greedy algorithm (2 Kkal/mol) (Greedy Parameter);

Setting the borders of the sequence or alignment zone, for which the secondary structure will be built (Part of Sequence: From - To).

Number of the sequence for which the secondary structure will be predicted (Treated sequence).

sequence (should be in one-letter code format):
for example:

tggcacaagc gccgcaaaac cgggggcaag agaaagccct
accacaagaa gcggaagtat gagttggggc gcccagctgc
caacaccaag ttggcccccg ccgcatccac acagtccgtg
tgcggggagg taacaagaaa taccgtgccc tgaggttgga
tggaggagca gttccagcag ggcaagcttc ttggtgagaa
ggcgtgcatc gcttcaaggc cgggacagtg tggccgagca
gatggctatg tgctagaggg caaagagttg gagttctatc
ttaggaaaat caaggcccgc

or alignment which should be represented as following:

CLWRNA     AACCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAGAGATTAAGCCATGCATGTC
EHIRRNA    AACCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAGAGATTAAGCCATGCATGTC
FSLRRNA    AACCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAGAGATTAAGCCATGCATGTC

CLWRNA     TAAGTACATACCTTA---CGGTGAAACCGCGAATGGCTCATTAAATCAGCTATGGTTCC
EHIRRNA    TAAGTACATACCTTCA--CGGTGAAACCGCGAATGGCTCATTAAATCAGCTATGGTTCC
FSLRRNA    TAAGTACAAACCTTTAAACGGTGAAACCGCGAATGGCTCATTAAATCAGCTATGGTTCC

CLWRNA     TTGGATCGTACATACTACATGGATAACTGTAGTAATTCTAGAGCTAATACAT
EHIRRNA    TTGGATCGTACATTGTACATGGATAACTGTACTAATTCTAGAGCTAATACAT
FSLRRNA    TTAGATCGTACATACTACATGGATAACTGTAGTAATTCTAGAGCTAATACAT

It should be mentioned that by default, at alignment case, the RNA secondary structure will be predicted for the first sequence of the alignment. The right number of the sequence could be selected in parameter "Treated sequence" (sophisticated version of the query form).

The algorithm is the following: at first all of the possible ways of fitting together different pieces of the sequences (or the alignment as a hard solid) are looked over.

On the next step locally optimal secondary structures are built from the helices found (hierarchic cluster analysis joining of helices is done). In particular significant pseudoknots could be found at the step.

Now, final system construction can be run. It is done through optimizing, not the real, but model energy of the structure. This model energy includes inputs from conservative and complementary pairs with corresponding coefficients. After the final calculations, the pairs, included into the final structure, will be highlighted on the stack map. Then the graphical model of the RNA structure is built.

The output window is divided into two parts: the lower graphical "structure" frame and the upper text frame for detailed description of local stack zones on the sequence. In the "STRUCTURE" window you will see the secondary structure of the selected sequence. Complementary pairs of its hairpin zones will be shown in yellow (or cyan), white and green color, which means correspondingly compensatory changes or conservativeness for given pair of complementary positions of the alignment, either the given complementary pair of positions exists only in the treated sequence.