is a system for the analysis of micro-array experiments
results. It includes modules for background correction an normalization
(NM stage), principal component analysis (PCA stage), clusterization
(CL stage), quality control (QC stage), and main vector determination
(MV stage). The procedures use standard techniques as well as new methods
and approaches developed mainly by L.I. Brodsky, A.M. Leontovich and
other members of the GeneBee group.
The input for each GEA project consists of a set of probes
in standard Affymetrix format (one probe per file), and the
output of each stage consists of a set of tables (one table per file).
For many table the information can be visualized using the developed
The first stage of each GEA project (NM stage) is background correction
and normalization. This stage includes procedures for artifacts reduction
and signal rescaling.
In order to improve the quality of background correction and normalization
the procedures is initially applied to the overlapping regions of the probes
and than the resulting information is combined using smoothing
Principal component analysis (PCA stage) is a standard step of the analysis
of micro-array experiment results. It is based on the evaluation of
eigenvalues and eigenvectors in the gene space and the analysis of
eigenvectors that correspond to the greatest eigenvalues.
IN GEA PCA is combined with the initial (rough) clustering to make the
analysis more meaningful.
Clusterization stage (CL stage) is designed to form a nested structure
of gene clusters. Clusters can be treated as sets of genes with similar
patterns, and the nested structure enables the multi-level analysis
(from global analysis to the analysis of small sets of genes with
very similar behavior).
The usage of complex mathematical models (different metrics, different
clusterization schemes) allows to obtain very good results and to adjust
the stage to every individual task.
The quality control stage (QC stage) allows to correct the complicated
artificial influence on the experiment result and to determine
clusters of genes that should be excluded from the biological
analysis due to the artificial nature of these clusters.
The idea behind this stage is the analysis of geometrical properties
of gene clusters, particularly, the cluster localization on each probe.
The main vector determination stage (MV stage) selects a relatively small
subset of genes that are expected to have the maximum connection with
the studied biological conditions.
The advantage of these stage is that no ordering of probes with
respect to the studied biological condition is required, the analysis
is performed in a space of all probe pairs. This analysis uses
the ideas of sparse representation theory and includes the modern methods of
approximation theory, e.g., greedy expansion with the correction
of the expanding elements.