ConSeq FAQ
1. What is the minimum number of
sequences required to get reliable results?
2. What are the advantages of
using phylogenetic trees?
3. For what kind of proteins is it advisable to use ConSeq?
4. What should I do if I find problems uploading external MSA files?
5.
Is it possible for the extreme grades 1 and 9 to be unoccupied? Which conditions give this result?
6. Can I download the core algorithm of ConSeq?
7. What is the maximal ConSeq run?
1. What is the minimum
number of sequences required to get reliable results?
There is not an exact answer to
this, as the variance between the sequences matters. Nevertheless, as a 'rule of
thumb', we recommend a minimum of 10 homologues. If PSI-BLAST found fewer than
that, you can try to raise the E-value cutoff, for example by changing it from
the default 0.001 to 0.01. However, the additional sequences found may be phylogenetically distant from the query sequence, which can
influence the quality of the generated MSA. On the other hand, if too many
homologues are used, the conservation signal/s may decrease due to background
noise added to the MSA. Thus, a better solution is to supply your own MSA file.
2. What are the
advantages of using phylogenetic trees?
- All databases contain a certain degree of
over-representation of certain families or species (HIV, Human, etc.).
Phylogenetic trees deal better with redundancy than methods that analyze
MSA directly, by weighting clusters of closely-related sequences
differently. This clustering process diminishes the influence of redundant
sequences.
- The phylogenetic tree simulates the
evolutionary process better than an MSA. This allows us to identify the
more accurately mutations that could have occurred in the history of a
family of homologous sequences.
- The phylogenetic tree reconstruction provides
us with additional in-silico mutational
data for the calculation. This data increases the statistical reliability
of the calculations.
3. For what kind of
proteins is it advisable to use ConSeq?
ConSeq can be used for all proteins and domain sequences. However, for
proteins with determined structures it is advisable to use the ConSurf server http://consurf.tau.ac.il/.
For transmembrane (TM)
proteins it is advisable to consider the conservation results only. The
buried/exposed prediction for the TM regions is inaccurate and consequently
misleads the discrimination between the structural and functional residues.
4. What should I do
if I find problems uploading external MSA files?
- Check your MSA file in a simple
text editor (e.g. Notepad on Windows). It is very common that MSA files
downloaded from the web contain unnecessary characters. Eliminate them,
and save your file as text only.
- We have found some kind of
incompatibility between the text format of PC / Unix and Mac machines. If
you are running the ConSeq server from a Mac
platform, and you get repetitive error messages, we recommend that you
save your file using Word as an "MS-Dos" text file. This format
should be compatible with the Dos and Unix text files.
5. Is it possible for the
extreme grades 1 and 9 to be unoccupied? Which conditions give this result?
- Grades 1-8 can be unoccupied,
although this will occur rarely, such as when ConSeq
finds few homologues. Grade 9 is always occupied by at least one residue.
6. Can I download the core algorithm of ConSeq ?
- Rate4Site, the core algorithm of ConSeq, is a program for detecting conserved amino-acid sites by computing the relative evolutionary rate for each site in the multiple sequence alignment (MSA).
- Downloading the program (Unix machine) :
Rate4Site.tar.gz
- Openning and compiling the proram:
>gunzip Rate4Site.tar.gz
>tar -xvf Rate4Site.tar
>g++ -o r4s.exe -O3 -ftemplate-depth-250 *.cpp
- Running Rate4Site:
For help:
>r4s.exe -h
Example:
>r4s.exe -a '' -s -Qc -Mj -z
7. What is the maximal ConSeq run?
- ConSeq run will be automatically terminated after 96 hours.