ConSeq FAQ

1. What is the minimum number of sequences required to get reliable results?

2. What are the advantages of using phylogenetic trees?

3. For what kind of proteins is it advisable to use ConSeq?

4. What should I do if I find problems uploading external MSA files?

 5. Is it possible for the extreme grades 1 and 9 to be unoccupied? Which conditions give this result?

6. Can I download the core algorithm of ConSeq?

 7. What is the maximal ConSeq run?

 

1. What is the minimum number of sequences required to get reliable results?

There is not an exact answer to this, as the variance between the sequences matters. Nevertheless, as a 'rule of thumb', we recommend a minimum of 10 homologues. If PSI-BLAST found fewer than that, you can try to raise the E-value cutoff, for example by changing it from the default 0.001 to 0.01. However, the additional sequences found may be phylogenetically distant from the query sequence, which can influence the quality of the generated MSA. On the other hand, if too many homologues are used, the conservation signal/s may decrease due to background noise added to the MSA. Thus, a better solution is to supply your own MSA file.
 

2. What are the advantages of using phylogenetic trees?

  • All databases contain a certain degree of over-representation of certain families or species (HIV, Human, etc.). Phylogenetic trees deal better with redundancy than methods that analyze MSA directly, by weighting clusters of closely-related sequences differently. This clustering process diminishes the influence of redundant sequences.
     
  • The phylogenetic tree simulates the evolutionary process better than an MSA. This allows us to identify the more accurately mutations that could have occurred in the history of a family of homologous sequences.
     
  • The phylogenetic tree reconstruction provides us with additional in-silico mutational data for the calculation. This data increases the statistical reliability of the calculations.

3. For what kind of proteins is it advisable to use ConSeq?

ConSeq can be used for all proteins and domain sequences. However, for proteins with determined structures it is advisable to use the ConSurf server http://consurf.tau.ac.il/.

For transmembrane (TM) proteins it is advisable to consider the conservation results only. The buried/exposed prediction for the TM regions is inaccurate and consequently misleads the discrimination between the structural and functional residues.

4. What should I do if I find problems uploading external MSA files?

  • Check your MSA file in a simple text editor (e.g. Notepad on Windows). It is very common that MSA files downloaded from the web contain unnecessary characters. Eliminate them, and save your file as text only. 
  • We have found some kind of incompatibility between the text format of PC / Unix and Mac machines. If you are running the ConSeq server from a Mac platform, and you get repetitive error messages, we recommend that you save your file using Word as an "MS-Dos" text file. This format should be compatible with the Dos and Unix text files.

5. Is it possible for the extreme grades 1 and 9 to be unoccupied? Which conditions give this result?

  • Grades 1-8 can be unoccupied, although this will occur rarely, such as when ConSeq finds few homologues. Grade 9 is always occupied by at least one residue.

6. Can I download the core algorithm of ConSeq ?

  • Rate4Site, the core algorithm of ConSeq, is a program for detecting conserved amino-acid sites by computing the relative evolutionary rate for each site in the multiple sequence alignment (MSA).

  • Downloading the program (Unix machine) :

  •  Rate4Site.tar.gz

  • Openning and compiling the proram:

  •  >gunzip Rate4Site.tar.gz

     >tar -xvf Rate4Site.tar

     >g++ -o r4s.exe -O3 -ftemplate-depth-250 *.cpp

  • Running Rate4Site:

  •  For help:

     >r4s.exe -h

     Example:

     >r4s.exe -a '' -s -Qc -Mj -z

7. What is the maximal ConSeq run?

  • ConSeq run will be automatically terminated after 96 hours.