|
Predictions
We used the ConSeq server to predict functional and
structural residues in 111 domains and proteins of unknown function and
structure from the Pfam database (Bateman et al., 2002) and from the SWISS-PROT
database (O'Donovan et al., 2002). 96 protein families from the Pfam
database that contain at least 30 members in their full or seed alignments were
selected. External MSAs were uploaded as inputs to the ConSeq server. We chose
SWISS-PROT query sequences when possible. When the MSA contained only TreMBL
sequences, the query sequence was chosen arbitrarily. In addition, we selected
15 human proteins of unknown function from the SWISS-PROT database. The MSAs
were generated automatically with 3 PSI-BLAST iterations, up to 100 but not
lower than 10 homologue sequences, and an E-value lower than 0.001 (see Methodology
).
In 16 cases the proteins or domains contain transmembrane
(TM) segments. The solvent accessibility predictions for these regions are
obviously unreliable (see Methodology), and consequently the identification of
functional and structural residues in these segments is misleading. We
identified the segments according to SWISS_PROT annotation for the query
sequences.
Click
here to view the ConSeq Predictions Database
Example Pfam: DUF289
The DUF289 domain has an average length
of 353 residues and is found in a family of 41 proteins from different eukaryotic
species (numerous sequences are from C. elegans). We carried out
a ConSeq analysis using the human bestrophin domain VMD2_HUMAN as a query
sequence. Bestrophin, also known as Vitelliform macular dystrophy protein, is
a 585-residues protein, predicted from the open reading frame of the VMD2 gene.
Defects in VMD2 are the cause of Best macular dystrophy (BMD), also known as vitelliform
macular dystrophy type 2. BMD is an autosomal dominant disease characterized by
typical "egg-yolk" macular lesions due to abnormal accumulation of lipofuscin
within and beneath the retinal pigment epithelium cells. Progression of the
disease leads to destruction of the retinal pigment epithelium and vision loss.
Thus far, the function of bestrophin is unknown (Marquardt et al., 1998; Bakall
et al., 1999).
Bestrophin is predicted to be a TM
protein with four TM spans. The neural network algorithm predicts the relative surface accessibility of
soluble proteins only. Thus, the buried/exposed predictions for the TM
segments are inaccurate, and the structural/functional classification is
meaningless for these segments. We therefore focus only on the soluble part of
the domain.
The ConSeq results, presented in Figure 4 below, reveal that 37
residues are predicted to be functionally important and 20 residues to have a
structural role. ConSeq assigned conservation grades above the average (6-9
color grades) to 40 out of 49 missense mutations that have been reported to
cause VMD2 disease in this domain (http://www.uni-wuerzburg.de/humangenetics/vmd2.html - December 2002 version). 16 of these 49 residues
evolve particularly slowly; 10 of them are predicted to be functional residues,
while 6 are predicted to have an important structural role.
Figure 5: ConSeq predictions with VMD2_HUMAN and
its close homologous proteins of the DUF289 family. The MSA was obtained from
the Pfam database and contains 43 homologous sequences. The sequence of the
query protein is given with the evolutionary rates at each site color-coded
onto it (see Legend). The residues of the query sequence are numbered starting
from 1. The next row lists the predicted burial status of the site (i.e., "b"-
buried vs. "e"- exposed). The lowest row shows structurally - and functionally-
important residues as "s" and "f", respectively. The four TM segments according
to Bakall et al. are marked as green boxes.
ConSeq Results
1 11 21 31 41
MTITYTSQVA NARLGSFSRL LLCWRGSIYK LLYGEFLIFL LCYYIIRFIY
eebebeeebb eeebebbbeb bbebeebbbe bbbeebBBBB BBBBBBBBBB
ff s s fs f f BBBB BBBBBBBBBB
|
51 61 71 81 91
RLALTEEQQL MFEKLTLYCD SYIQLIPISF VLGFYVTLVV TRWWNQYENL
BBbbeeeeee bbeebbebbe eebeBBBBBB BBBBBBBBBB Bebbebbeeb
BB BBBB B BBBBBBBBBB Bfs
|
101 111 121 131 141
PWPDRLMSLV SGFVEGKDEQ GRLLRRTLIR YANLGNVLIL RSVSTAVYKR
ebbeebbbbb bbbbeeeeee bebbeebbbe bbbbbbbbbb eebbeebeee
sf ff f s f s f
|
151 161 171 181 191
FPSAQHLVQA GFMTPAEHKQ LEKLSLPHNM FWVPWVWFAN LSMKAWLGGR
beebebbbeb ebbeeeeeee beebeeeeee bbbbbbbbbb bbeebeeeee
sff f f s s
|
201 211 221 231 241
IRDPILLQSL LNEMNTLRTQ CGHLYAYDWI SIPLVYTQVV TVAVYSFFLT
beeebbbeeb beebeebeee bebbbebebb bbbbbbbBBB BBBBBBBBBB
f f s sssBBB BBBBBBBBBB
|
251 261 271 281 291
CLVGRQFLNP AKAYPGHELD LVVPVFTFLQ FFFYVGWLKV AEQLINPFGE
BBBBeeeeee eeeeeeeeBB BBBBBBBBBB BBBBBbbbeb beebbbeeee
BBBBff BB BBBBBBBBBB BBBBBss fs f sffff
|
301 311 321 331 341
DDDDFETNWI VDRNLQVSLL AVDEMHQDLP RMEPDMYWNK PEPQPPYTAA
eeeebebbbb beeebebbbb bbbeeeeeee ebeeeebeee eeeeeeeeee
ffff f s fff s f f
|
351 361 371
SAQFRRASFM GSTFNISLNK EEMEFQPN
eeeeeeeeee eebeebeeee eebeeeee
f
|
|
Legend:
The conservation scale:
1 2 3 4 5 6 7 8 9
Variable
|
Average
|
Conserved
|
|
e - An exposed residue according to the neural network
algorithm.
b - A buried residue according to the neural network
algorithm.
f - A predicted functional residue (highly conserved
and exposed).
s - A predicted structural residue (highly conserved
and buried).
X -
Insufficient data - the calculation for this site was performed on less than
10% of the sequences.
BB - Transmembrane
segment
REFERENCES
- Bakall, B., Marknell,
T., Ingvast, S., Koisti, M.J., Sandgren, O., Li, W., Bergen, A.A.B., Andreasson,
S., Rosenberg, T., Petrukhin, K. and Wadelius, C. The mutation spectrum
of the bestrophin protein - functional implications. Hum. Genet., 104,
383-389, 1999.
2.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller,
L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M. and Sonnhammer,
E.L. The Pfam Protein Families Database. Nucleic Acids Research 30(1):
276-280, 2002
3.
Mizuguchi, K., Deane, C.M.,
Blundell, T.L. and Overington, J.P. HOMSTRAD: a database of protein structure
alignments for homologous families. Protein Sci. 7: 2469-2471,1998.
4.
Marquardt, A., Stoehr, H., Passmore,
L.A., Kraemer, F., Rivera, A. and Weber, B.H.F. Mutations in a novel gene,
VMD2, encoding a protein of unknown properties cause juvenile-onset vitelliform
macular dystrophy (Best's disease). Hum. Mol. Genet., 7,1517-1525, 1998.
5.
O'Donovan, C., Martin, M.J., Gattiker,
A., Gasteiger, E., Bairoch, A. and Apweiler, R. High-quality protein knowledge
resource: SWISS-PROT and TrEMBL Brief. Bioinform. 3, 275-284, 2002.
ConSeq Predictions Database
Prediction of functionally and
structurally important residues in proteins and domains of unknown function
from the Pfam database (version 7.8)
|
ConSeq Results
|
No. Of
Homologues
|
Average length
|
Seq. ID%
|
Description
(link to Pfam)
|
ConSeq (DUF11_full)
|
62
|
75.0
|
25
|
Domain of unknown
function DUF11
|
ConSeq (DUF11_seed)
|
35
|
75.0
|
25
|
Domain of unknown
function DUF11
|
ConSeq (DUF130_full)
|
38
|
125.2
|
30
|
Domain of unknown
function DUF130
|
ConSeq (DUF139_full)
|
287
|
17.1
|
38
|
Cysteine rich repeat
DUF139
|
ConSeq (DUF139_seed)
|
136
|
17.1
|
38
|
Cysteine rich repeat
DUF139
|
ConSeq (DUF140_full)
|
39
|
232.5
|
30
|
Domain of unknown
function DUF140
|
ConSeq
(DUF141_full)
|
55
|
115.3
|
23
|
Domain of unknown
function DUF141
|
ConSeq
(DUF141_seed)
|
39
|
115.3
|
23
|
Domain of unknown
function DUF141
|
ConSeq
(DUF143_full)
|
39
|
102.4
|
30
|
Domain of unknown
function DUF143
|
ConSeq
(DUF145_full)
|
65
|
306.8
|
21
|
Chlamydia protein of
unknown function DUF145
|
ConSeq
(DUF146_full)
|
34
|
501.7
|
23
|
Integral membrane
protein of unknown function DUF146 TM segments in query seq.: 34-54, 63-83, 107-127, 147-167
|
ConSeq
(DUF147_full)
|
32
|
122.3
|
33
|
Domain of unknown
functionDUF147
|
ConSeq
(DUF148_full)
|
42
|
119.4
|
22
|
Domain of unknown
function DUF148
|
ConSeq
(DUF149_full)
|
40
|
90.1
|
33
|
Uncharacterized,
YbaB familyDUF149
|
ConSeq
(DUF149_seed)
|
30
|
90.1
|
33
|
Uncharacterized,
YbaB familyDUF149
|
ConSeq
(DUF150_full)
|
31
|
134.8
|
31
|
Uncharacterized,
YhbC family DUF150
|
ConSeq
(DUF152_full)
|
31
|
220.8
|
29
|
Uncharacterized,
YfiH family DUF152
|
ConSeq
(DUF154_full)
|
35
|
116.1
|
46
|
Uncharacterized DUF154
|
ConSeq
(DUF158_full)
|
32
|
225.4
|
23
|
Uncharacterized
LmbE-like protein DUF158
|
ConSeq
(DUF161_full)
|
106
|
82.7
|
20
|
Uncharacterized,
YitT family DUF161
|
ConSeq
(DUF161_seed)
|
95
|
82.7
|
20
|
Uncharacterized,
YitT family DUF161
|
ConSeq
(DUF173_full)
|
33
|
217.1
|
23
|
Uncharacterised DUF173
|
ConSeq
(DUF175_full)
|
34
|
317.2
|
31
|
Uncharacterized,
YceG family DUF175
|
ConSeq
(DUF19_full)
|
47
|
149.0
|
32
|
Domain of unknown
function DUF19
|
ConSeq
(DUF194_full)
|
44
|
196.2
|
25
|
Uncharacterized
protein, DegV family DUF194
|
ConSeq
(DUF205_full)
|
32
|
187.2
|
34
|
Domain of unknown
function DUF205 TM segments in query seq.: 4-24, 53-73, 82-102, 112-132, 138-158
|
ConSeq
(DUF209_full)
|
33
|
204.3
|
32
|
Uncharacterized,
YhhW family DUF209
|
ConSeq
(DUF21_full)
|
99
|
187.5
|
22
|
Domain of unknown
function DUF21
|
ConSeq
(DUF215_full)
|
40
|
136.7
|
33
|
Domain of unknown
function DUF215
|
ConSeq
(DUF216_full)
|
87
|
211.2
|
22
|
Domain of unknown
function DUF216
|
ConSeq
(DUF220_full)
|
30
|
99.6
|
55
|
Domain of unknown
function DUF220
|
ConSeq
(DUF221_full)
|
35
|
383.9
|
28
|
Domain of unknown
function DUF221 TM segments in query seq.: 23-43, 103-123, 148-168, 347-367, 392-412,
435-455, 481-501, 540-560, 575-595, 599-619, 642-662, 666-686
|
ConSeq
(DUF223_full)
|
68
|
88.6
|
31
|
Domain of unknown
function DUF223
|
ConSeq
(DUF227_full)
|
77
|
216.4
|
22
|
Domain of unknown
function DUF227
|
ConSeq
(DUF227_seed)
|
36
|
216.4
|
22
|
Domain of unknown
function DUF227
|
ConSeq
(DUF23_full)
|
74
|
346.9
|
18
|
Domain of unknown
function DUF23
|
ConSeq
(DUF23_seed)
|
46
|
346.9
|
18
|
Domain of unknown
function DUF23
|
ConSeq
(DUF230_full)
|
41
|
128.3
|
29
|
Poxvirus proteins of
unknown function DUF230
|
ConSeq
(DUF231_full)
|
50
|
166.2
|
32
|
Arabidopsis proteins
of unknown function DUF231
|
ConSeq
(DUF233_full)
|
33
|
126.5
|
21
|
Odorant binding
protein DUF233
|
ConSeq
(DUF238_full)
|
31
|
82.3
|
28
|
Archaebacterial
proteins of unknown function DUF238
|
ConSeq
(DUF239_full)
|
49
|
177.9
|
36
|
Arabidopsis proteins
of unknown function DUF239
|
ConSeq
(DUF24_full)
|
88
|
90.6
|
33
|
Transcriptional
regulator DUF24
|
ConSeq
(DUF246_full)
|
42
|
331.4
|
38
|
Plant protein family
DUF246
|
ConSeq
(DUF248_full)
|
40
|
465.5
|
43
|
Putative
methyltransferase DUF248
|
ConSeq
(DUF256_Full)
|
30
|
189.6
|
41
|
Protein of unknown
function, DUF256
|
ConSeq
(DUF258_full)
|
34
|
151.2
|
34
|
Protein of unknown
function DUF258
|
ConSeq
(DUF26_full)
|
210
|
53.3
|
28
|
Domain of unknown
function DUF26
|
ConSeq
(DUF26_seed)
|
76
|
53.3
|
28
|
Domain of unknown
function DUF26
|
ConSeq
(DUF260_full)
|
48
|
104.1
|
36
|
Protein of unknown
function DUF260
|
ConSeq
(DUF279_full)
|
86
|
169.7
|
21
|
SNF7 family DUF279
|
ConSeq
(DUF279_seed)
|
39
|
169.7
|
21
|
SNF7 family DUF279
|
ConSeq
(DUF28_full)
|
53
|
231.1
|
37
|
Domain of unknown
function DUF28
|
ConSeq
(DUF289_full)
|
34
|
352.6
|
36
|
Putative membrane
protein DUF289
|
ConSeq
(DUF290_full)
|
57
|
116.8
|
32
|
Transthyretin-like
family DUF290
|
ConSeq
(DUF290_seed)
|
36
|
116.8
|
32
|
Transthyretin-like
family DUF290
|
ConSeq
(DUF295_seed)
|
37
|
296.1
|
37
|
Protein of unknown
function DUF295
|
ConSeq
(DUF296_full)
|
49
|
120.6
|
35
|
Domain of unknown
function DUF296
|
ConSeq
(DUF296_seed)
|
37
|
120.6
|
35
|
Domain of unknown
function DUF296
|
ConSeq
(DUF301_full)
|
103
|
40.7
|
98
|
Domain of unknown
function DUF301
|
ConSeq
(DUF32_full)
|
59
|
227.5
|
32
|
Domain of unknown
function DUF32
|
ConSeq
(DUF32_seed)
|
35
|
227.5
|
32
|
Domain of unknown
function DUF32
|
ConSeq
(DUF321_full)
|
133
|
19.6
|
51
|
Protein of unknown
function DUF321
|
ConSeq
(DUF321_seed)
|
32
|
19.6
|
51
|
Protein of unknown
function DUF321
|
ConSeq
(DUF322_full)
|
30
|
107.3
|
30
|
Protein of unknown
function DUF322
|
ConSeq
(DUF323_full)
|
33
|
243.4
|
27
|
Domain of unknown
function DUF323
|
ConSeq
(DUF326_full)
|
32
|
23.6
|
30
|
Domain of Unknown
Function DUF326
|
ConSeq
(DUF34_full)
|
57
|
226.1
|
25
|
Domain of unknown
function DUF34
|
ConSeq
(DUF35_full)
|
41
|
119.0
|
25
|
Domain of unknown
function DUF35
|
ConSeq
(DUF37_full)
|
38
|
67.7
|
45
|
Domain of unknown
function DUF37
|
ConSeq
(DUF387_full)
|
40
|
171.4
|
29
|
Putative
transcriptional regulators (Ypuh-like) DUF387
|
ConSeq
(DUF390_full)
|
30
|
601.5
|
66
|
Protein of unknown
function DUF390
|
ConSeq
(DUF395_full)
|
36
|
64.5
|
24
|
YeeE/YedE family DUF395 TM segments in query seq.: 290-310, 318-338
|
ConSeq
(DUF395_seed)
|
32
|
64.5
|
24
|
YeeE/YedE family DUF395
|
ConSeq
(DUF398_full)
|
52
|
77.9
|
22
|
Domain of unknown
function DUF398
|
ConSeq
(DUF398_seed)
|
47
|
77.9
|
22
|
Domain of unknown
function DUF398
|
ConSeq
(DUF40_full)
|
79
|
221.7
|
27
|
Domain of unknown
function DUF40
|
ConSeq
(DUF40_seed)
|
46
|
221.7
|
27
|
Domain of unknown
function DUF40
|
ConSeq
(DUF457_full)
|
33
|
100.7
|
26
|
Predicted
membrane-bound metal-dependent hydrolase DUF457 TM segments in query seq.: 82-102, 154-174
|
ConSeq
(DUF536_full)
|
32
|
45.7
|
40
|
Protein of unknown
function DUF536
|
ConSeq
(DUF558_full)
|
43
|
107.5
|
29
|
Protein of unknown
function DUF558
|
ConSeq
(DUF558_seed)
|
36
|
107.5
|
29
|
Protein of unknown
function DUF558
|
ConSeq
(DUF58_full)
|
37
|
182.6
|
21
|
Protein of unknown
function DUF58
|
ConSeq
(DUF588_full)
|
39
|
142.2
|
23
|
Domain of unknown
function DUF588
|
ConSeq
(DUF588_seed)
|
34
|
142.2
|
23
|
Domain of unknown
function DUF588
|
ConSeq
(DUF59_full)
|
64
|
76.8
|
29
|
Domain of unknown
function DUF59
|
ConSeq
(DUF6_full)
|
748
|
126.5
|
15
|
Integral membrane
protein DUF6
Too many sequences
|
ConSeq
(DUF6_seed)
|
106
|
126.5
|
15
|
Integral membrane
protein DUF6
TM segments in query seq.: 176-196, 217-237, 244-264, 269-287
|
ConSeq
(DUF614_full)
|
32
|
114.3
|
30
|
Protein of unknown
function DUF614
|
ConSeq
(DUF665_full)
|
35
|
68.1
|
22
|
Putative RNA binding
domain DUF665
|
ConSeq
(DUF665_seed)
|
31
|
68.1
|
22
|
Putative RNA binding
domain DUF665
|
ConSeq
(DUF704_full)
|
31
|
138.7
|
18
|
Domain of unknown
function DUF704
|
ConSeq
(DUF72_full)
|
30
|
232.4
|
25
|
Protein of unknown
function DUF72
|
ConSeq
(DUF81_full)
|
117
|
245.7
|
19
|
Domain of unknown
function DUF81 TM segments in query seq.: 12-28, 31-51, 85-105, 112-132, 157-177,
198-218
|
ConSeq
(DUF81_seed)
|
34
|
245.7
|
19
|
Domain of unknown
function DUF81
|
ConSeq
(UPF0004_full)
|
90
|
95.9
|
33
|
Uncharacterized
protein family UPF0004
|
ConSeq
(UPF0004_seed)
|
35
|
95.9
|
33
|
Uncharacterized
protein family UPF0004
|
ConSeq
(UPF0005_full)
|
86
|
248.0
|
21
|
Uncharacterised
protein family UPF0005 TM segments in query seq.: 39-59, 69-89, 98-118, 121-141, 152-172,
176-196, 209-229
|
ConSeq
(UPF0016_full)
|
45
|
73.3
|
37
|
Uncharacterized
protein family UPF0016 TM segments in query seq.: 228-248, 267-287, 299-319
|
ConSeq
(UPF0020_full)
|
44
|
188.1
|
27
|
Putative RNA
methylase family UPF0020
|
ConSeq
(UPF0021_full)
|
91
|
200.1
|
24
|
PP-loop family UPF0021
|
ConSeq
(UPF0021_seed)
|
36
|
200.1
|
24
|
PP-loop familyUPF0021
|
ConSeq
(UPF0027_full)
|
32
|
394.8
|
43
|
Uncharacterized
protein family UPF0027
|
ConSeq
(UPF0029_full)
|
38
|
112.4
|
39
|
Uncharacterized
protein family UPF0029
|
ConSeq
(UPF0031_full)
|
48
|
239.9
|
29
|
Uncharacterized
protein family UPF0031
|
ConSeq
(UPF0032_full)
|
64
|
205.6
|
26
|
MttB family UPF0032 TM segments in query seq.: 24-44, 76-96, 116-136, 157-177, 194-210,
212-221
|
ConSeq
(UPF0033_full)
|
40
|
72.3
|
29
|
Uncharacterized
protein family UPF0033
|
ConSeq
(UPF0040_full)
|
47
|
71.3
|
25
|
Domain of unknown
function UPF0040
|
ConSeq
(UPF0047_full)
|
30
|
117.3
|
35
|
Uncharacterised
protein family UPF0047
|
ConSeq
(UPF0051_full)
|
85
|
223.3
|
31
|
Uncharacterized
protein family UPF0051
|
ConSeq
(UPF0054_full)
|
44
|
96.9
|
38
|
Uncharacterized
protein family UPF0054
|
ConSeq
(UPF0057_full)
|
34
|
51.3
|
41
|
Uncharacterized
protein family UPF0057 TM segments in query seq.: 6-24, 37-57
|
ConSeq
(UPF0066_full)
|
32
|
117.7
|
38
|
Uncharacterised
protein family UPF0066
|
ConSeq
(UPF0073_full)
|
58
|
211.6
|
23
|
Uncharacterised
protein family (Hly-III) UPF0073 TM segments in query seq.: 29-49, 62-82, 102-122, 125-145, 152-172,
175-195, 199-219
|
ConSeq
(UPF0073_seed)
|
38
|
211.6
|
23
|
Uncharacterised
protein family (Hly-III) UPF0073
|
ConSeq
(UPF0074_full)
|
54
|
120.4
|
28
|
Uncharacterized
protein family UPF0074
|
ConSeq
(UPF0079_full)
|
40
|
116.3
|
35
|
Uncharacterised
P-loop hydrolaseUPF0079
|
ConSeq
(UPF0081_full)
|
41
|
133.0
|
30
|
Uncharacterised
protein family UPF0081
|
ConSeq
(UPF0118_full)
|
88
|
323.6
|
18
|
Domain of unknown
function UPF0118 TM segments in query seq.: 19-39, 72-92, 156-176, 217-237, 240-260,
281-301, 310-330
|
ConSeq
(UPF0118_seed)
|
32
|
323.6
|
18
|
Domain of unknown
function UPF0118
|
ConSeq
(UPF0126_full)
|
42
|
86.1
|
30
|
domain UPF0126 TM segments in query seq.: 88-108, 112-132, 148-168
|
ConSeq
(UPF0126_seed)
|
40
|
86.1
|
30
|
domain UPF0126
|
ConSeq
(UPF0153_full)
|
33
|
96.4
|
24
|
Uncharacterised
protein family UPF0153
|
|