Predictions

We used the ConSeq server to predict functional and structural residues in 111 domains and proteins of unknown function and structure from the Pfam database (Bateman et al., 2002) and from the SWISS-PROT database (O'Donovan et al., 2002). 96 protein families from the Pfam database that contain at least 30 members in their full or seed alignments were selected. External MSAs were uploaded as inputs to the ConSeq server. We chose SWISS-PROT query sequences when possible. When the MSA contained only TreMBL sequences, the query sequence was chosen arbitrarily. In addition, we selected 15 human proteins of unknown function from the SWISS-PROT database. The MSAs were generated automatically with 3 PSI-BLAST iterations, up to 100 but not lower than 10 homologue sequences, and an E-value lower than 0.001 (see Methodology ).

In 16 cases the proteins or domains contain transmembrane (TM) segments. The solvent accessibility predictions for these regions are obviously unreliable (see Methodology), and consequently the identification of functional and structural residues in these segments is misleading. We identified the segments according to SWISS_PROT annotation for the query sequences.

 

Click here to view the ConSeq Predictions Database

 

Example Pfam: DUF289

The DUF289 domain has an average length of 353 residues and is found in a family of 41 proteins from different eukaryotic species (numerous sequences are from C. elegans). We carried out a ConSeq analysis using the human bestrophin domain VMD2_HUMAN as a query sequence. Bestrophin, also known as Vitelliform macular dystrophy protein, is a 585-residues protein, predicted from the open reading frame of the VMD2 gene. Defects in VMD2 are the cause of Best macular dystrophy (BMD), also known as vitelliform macular dystrophy type 2. BMD is an autosomal dominant disease characterized by typical "egg-yolk" macular lesions due to abnormal accumulation of lipofuscin within and beneath the retinal pigment epithelium cells. Progression of the disease leads to destruction of the retinal pigment epithelium and vision loss. Thus far, the function of bestrophin is unknown (Marquardt et al., 1998; Bakall et al., 1999).

Bestrophin is predicted to be a TM protein with four TM spans. The neural network algorithm predicts the relative surface accessibility of soluble proteins only. Thus, the buried/exposed predictions for the TM segments are inaccurate, and the structural/functional classification is meaningless for these segments. We therefore focus only on the soluble part of the domain.

The ConSeq results, presented in Figure 4 below, reveal that 37 residues are predicted to be functionally important and 20 residues to have a structural role. ConSeq assigned conservation grades above the average (6-9 color grades) to 40 out of 49 missense mutations that have been reported to cause VMD2 disease in this domain (http://www.uni-wuerzburg.de/humangenetics/vmd2.html - December 2002 version). 16 of these 49 residues evolve particularly slowly; 10 of them are predicted to be functional residues, while 6 are predicted to have an important structural role.

 


Figure 5: ConSeq predictions with VMD2_HUMAN and its close homologous proteins of the DUF289 family. The MSA was obtained from the Pfam database and contains 43 homologous sequences. The sequence of the query protein is given with the evolutionary rates at each site color-coded onto it (see Legend). The residues of the query sequence are numbered starting from 1. The next row lists the predicted burial status of the site (i.e., "b"- buried vs. "e"- exposed). The lowest row shows structurally - and functionally- important residues as "s" and "f", respectively. The four TM segments according to Bakall et al. are marked as green boxes.


ConSeq Results

1          11         21         31         41         
MTITYTSQVA NARLGSFSRL LLCWRGSIYK LLYGEFLIFL LCYYIIRFIY
eebebeeebb eeebebbbeb bbebeebbbbbbeebBBBB BBBBBBBBBB
ff  s   s                  fs      f BBBB BBBBBBBBBB          

51         61         71         81         91         
RLALTEEQQL MFEKLTLYCD SYIQLIPISF VLGFYVTLVV TRWWNQYENL
BBbbeeeeee bbeebbebbe eebeBBBBBB BBBBBBBBBB Bebbebbeeb
BB                        BBBB B BBBBBBBBBB Bfs       

101        111        121        131        141        
PWPDRLMSLV SGFVEGKDEQ GRLLRRTLIR YANLGNVLIL RSVSTAVYKR
ebbeebbbbb bbbbeeeeee bebbeebbbbbbbbbbbbb eebbeebeee
  sf                      ff   s          f  s     f

151        161        171        181        191        
FPSAQHLVQA GFMTPAEHKQ LEKLSLPHNM FWVPWVWFAN LSMKAWLGGR
beebebbbeebbeeeeeee beebeeeeee bbbbbbbbbb bbeebeeeee
sff        f     f                  s  s              

201        211        221        231        241        
IRDPILLQSL LNEMNTLRTQ CGHLYAYDWI SIPLVYTQVV TVAVYSFFLT
beeebbbeeb beebeebeee bebbbebebb bbbbbbbBBB BBBBBBBBBB
                  f          f     s sssBBB BBBBBBBBBB

251        261        271        281        291        
CLVGRQFLNP AKAYPGHELD LVVPVFTFLQ FFFYVGWLKV AEQLINPFGE
BBBBeeeeee eeeeeeeeBB BBBBBBBBBB BBBBBbbbeb beebbbeeee
BBBBff             BB BBBBBBBBBB BBBBBss f f   sffff

301        311        321        331        341        
DDDDFETNWI VDRNLQVSLL AVDEMHQDLP RMEPDMYWNK PEPQPPYTAA
eeeebebbbb beeebebbbb bbbeeeeeee ebeeeebeee eeeeeeeeee
ffff f s    fff        s           f                

351        361        371        
SAQFRRASFM GSTFNISLNK EEMEFQPN
eeeeeeeeee eebeebeeee eebeeeee
            f                 

 

Legend:

The conservation scale:

 1  2  3  4  5  6  7  8  9 

Variable

Average

Conserved

e - An exposed residue according to the neural network algorithm.
b - A buried residue according to the neural network algorithm.
f - A predicted functional residue (highly conserved and exposed).
s - A predicted structural residue (highly conserved and buried).
X - Insufficient data - the calculation for this site was performed on less than 10% of the sequences.

BB   - Transmembrane segment





REFERENCES

 

  1. Bakall, B., Marknell, T., Ingvast, S., Koisti, M.J., Sandgren, O., Li, W., Bergen, A.A.B., Andreasson, S., Rosenberg, T., Petrukhin, K. and Wadelius, C.  The mutation spectrum of the bestrophin protein - functional implications. Hum. Genet.,  104, 383-389, 1999.

2.      Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M. and Sonnhammer, E.L. The Pfam Protein Families Database. Nucleic Acids Research 30(1): 276-280, 2002

3.      Mizuguchi, K., Deane, C.M., Blundell, T.L. and Overington, J.P. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 7: 2469-2471,1998.

4.      Marquardt, A., Stoehr, H., Passmore, L.A., Kraemer, F., Rivera, A. and Weber, B.H.F. Mutations in a novel gene, VMD2, encoding a protein of unknown properties cause juvenile-onset vitelliform macular dystrophy (Best's disease). Hum. Mol. Genet., 7,1517-1525, 1998.

5.      O'Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A. and Apweiler, R. High-quality protein knowledge resource: SWISS-PROT and TrEMBL  Brief. Bioinform. 3, 275-284, 2002.



ConSeq Predictions Database

Prediction of functionally and structurally important residues in proteins and domains of unknown function from the Pfam database (version 7.8)

ConSeq Results

No. Of

Homologues

Average length

Seq. ID%

Description (link to Pfam)

ConSeq (DUF11_full)

62

75.0

25

 Domain of unknown function DUF11

ConSeq (DUF11_seed)

35

75.0

25

 Domain of unknown function DUF11

ConSeq (DUF130_full)

38

125.2

30

 Domain of unknown function DUF130

ConSeq (DUF139_full)

287

17.1

38

 Cysteine rich repeat DUF139

ConSeq (DUF139_seed)

136

17.1

38

 Cysteine rich repeat DUF139

ConSeq (DUF140_full)

39

232.5

30

 Domain of unknown function DUF140

ConSeq (DUF141_full)

55

115.3

23

 Domain of unknown function DUF141

ConSeq (DUF141_seed)

39

115.3

23

 Domain of unknown function DUF141

ConSeq (DUF143_full)

39

102.4

30

 Domain of unknown function DUF143

ConSeq (DUF145_full)

65

306.8

21

Chlamydia protein of unknown function DUF145

ConSeq (DUF146_full)

34

501.7

23

Integral membrane protein of unknown function DUF146

TM segments in query seq.: 34-54, 63-83, 107-127, 147-167

ConSeq (DUF147_full)

32

122.3

33

 Domain of unknown functionDUF147

ConSeq (DUF148_full)

42

119.4

22

 Domain of unknown function DUF148

ConSeq (DUF149_full)

40

90.1

33

 Uncharacterized, YbaB familyDUF149

ConSeq (DUF149_seed)

30

90.1

33

 Uncharacterized, YbaB familyDUF149

ConSeq (DUF150_full)

31

134.8

31

 Uncharacterized, YhbC family DUF150

ConSeq (DUF152_full)

31

220.8

29

 Uncharacterized, YfiH family DUF152

ConSeq (DUF154_full)

35

116.1

46

 Uncharacterized DUF154

ConSeq (DUF158_full)

32

225.4

23

 Uncharacterized LmbE-like protein DUF158

ConSeq (DUF161_full)

106

82.7

20

 Uncharacterized, YitT family DUF161

ConSeq (DUF161_seed)

95

82.7

20

 Uncharacterized, YitT family DUF161

ConSeq (DUF173_full)

33

217.1

23

 Uncharacterised DUF173

ConSeq (DUF175_full)

34

317.2

31

 Uncharacterized, YceG family DUF175

ConSeq (DUF19_full)

47

149.0

32

 Domain of unknown function DUF19

ConSeq (DUF194_full)

44

196.2

25

 Uncharacterized protein, DegV family DUF194

ConSeq (DUF205_full)

32

187.2

34

 Domain of unknown function DUF205

TM segments in query seq.: 4-24, 53-73, 82-102, 112-132, 138-158

ConSeq (DUF209_full)

33

204.3

32

 Uncharacterized, YhhW family DUF209

ConSeq (DUF21_full)

99

187.5

22

 Domain of unknown function DUF21

ConSeq (DUF215_full)

40

136.7

33

 Domain of unknown function DUF215

ConSeq (DUF216_full)

87

211.2

22

 Domain of unknown function DUF216

ConSeq (DUF220_full)

30

99.6

55

 Domain of unknown function DUF220

ConSeq (DUF221_full)

35

383.9

28

 Domain of unknown function DUF221

TM segments in query seq.: 23-43, 103-123, 148-168, 347-367, 392-412, 435-455, 481-501, 540-560, 575-595, 599-619, 642-662, 666-686

ConSeq (DUF223_full)

68

88.6

31

 Domain of unknown function DUF223

ConSeq (DUF227_full)

77

216.4

22

 Domain of unknown function DUF227

ConSeq (DUF227_seed)

36

216.4

22

 Domain of unknown function DUF227

ConSeq (DUF23_full)

74

346.9

18

 Domain of unknown function DUF23

ConSeq (DUF23_seed)

46

346.9

18

 Domain of unknown function DUF23

ConSeq (DUF230_full)

41

128.3

29

 Poxvirus proteins of unknown function DUF230

ConSeq (DUF231_full)

50

166.2

32

Arabidopsis proteins of unknown function DUF231

ConSeq (DUF233_full)

33

126.5

21

 Odorant binding protein DUF233

ConSeq (DUF238_full)

31

82.3

28

Archaebacterial proteins of unknown function DUF238

ConSeq (DUF239_full)

49

177.9

36

Arabidopsis proteins of unknown function DUF239

ConSeq (DUF24_full)

88

90.6

33

Transcriptional regulator DUF24

ConSeq (DUF246_full)

42

331.4

38

 Plant protein family DUF246

ConSeq (DUF248_full)

40

465.5

43

 Putative methyltransferase DUF248

ConSeq (DUF256_Full)

30

189.6

41

 Protein of unknown function, DUF256

ConSeq (DUF258_full)

34

151.2

34

 Protein of unknown function DUF258

ConSeq (DUF26_full)

210

53.3

28

 Domain of unknown function DUF26

ConSeq (DUF26_seed)

76

53.3

28

 Domain of unknown function DUF26

ConSeq (DUF260_full)

48

104.1

36

 Protein of unknown function DUF260

ConSeq (DUF279_full)

86

169.7

21

 SNF7 family DUF279

ConSeq (DUF279_seed)

39

169.7

21

 SNF7 family DUF279

ConSeq (DUF28_full)

53

231.1

37

 Domain of unknown function DUF28

ConSeq (DUF289_full)

34

352.6

36

 Putative membrane protein DUF289

ConSeq (DUF290_full)

57

116.8

32

 Transthyretin-like family DUF290

ConSeq (DUF290_seed)

36

116.8

32

 Transthyretin-like family DUF290

ConSeq (DUF295_seed)

37

296.1

37

 Protein of unknown function DUF295

ConSeq (DUF296_full)

49

120.6

35

 Domain of unknown function DUF296

ConSeq (DUF296_seed)

37

120.6

35

 Domain of unknown function DUF296

ConSeq (DUF301_full)

103

40.7

98

 Domain of unknown function DUF301

ConSeq (DUF32_full)

59

227.5

32

 Domain of unknown function DUF32

ConSeq (DUF32_seed)

35

227.5

32

 Domain of unknown function DUF32

ConSeq (DUF321_full)

133

19.6

51

 Protein of unknown function DUF321

ConSeq (DUF321_seed)

32

19.6

51

 Protein of unknown function DUF321

ConSeq (DUF322_full)

30

107.3

30

 Protein of unknown function DUF322

ConSeq (DUF323_full)

33

243.4

27

 Domain of unknown function DUF323

ConSeq (DUF326_full)

32

23.6

30

Domain of Unknown Function DUF326

ConSeq (DUF34_full)

57

226.1

25

 Domain of unknown function DUF34

ConSeq (DUF35_full)

41

119.0

25

 Domain of unknown function DUF35

ConSeq (DUF37_full)

38

67.7

45

 Domain of unknown function DUF37

ConSeq (DUF387_full)

40

171.4

29

Putative transcriptional regulators (Ypuh-like) DUF387

ConSeq (DUF390_full)

30

601.5

66

 Protein of unknown function DUF390

ConSeq (DUF395_full)

36

64.5

24

 YeeE/YedE family DUF395

TM segments in query seq.: 290-310, 318-338

ConSeq (DUF395_seed)

32

64.5

24

 YeeE/YedE family DUF395

ConSeq (DUF398_full)

52

77.9

22

 Domain of unknown function DUF398

ConSeq (DUF398_seed)

47

77.9

22

 Domain of unknown function DUF398

ConSeq (DUF40_full)

79

221.7

27

 Domain of unknown function DUF40

ConSeq (DUF40_seed)

46

221.7

27

 Domain of unknown function DUF40

ConSeq (DUF457_full)

33

100.7

26

Predicted membrane-bound metal-dependent    hydrolase DUF457

TM segments in query seq.: 82-102, 154-174

ConSeq (DUF536_full)

32

45.7

40

Protein of unknown function DUF536

ConSeq (DUF558_full)

43

107.5

29

Protein of unknown function DUF558

ConSeq (DUF558_seed)

36

107.5

29

Protein of unknown function DUF558

ConSeq (DUF58_full)

37

182.6

21

Protein of unknown function DUF58

ConSeq (DUF588_full)

39

142.2

23

Domain of unknown function DUF588

ConSeq (DUF588_seed)

34

142.2

23

Domain of unknown function DUF588

ConSeq (DUF59_full)

64

76.8

29

Domain of unknown function DUF59

ConSeq (DUF6_full)

748

126.5

15

Integral membrane protein DUF6

Too many sequences

ConSeq (DUF6_seed)

106

126.5

15

Integral membrane protein DUF6

TM segments in query seq.: 176-196, 217-237, 244-264, 269-287

ConSeq (DUF614_full)

32

114.3

30

Protein of unknown function DUF614

ConSeq (DUF665_full)

35

68.1

22

Putative RNA binding domain DUF665

ConSeq (DUF665_seed)

31

68.1

22

Putative RNA binding domain DUF665

ConSeq (DUF704_full)

31

138.7

18

Domain of unknown function DUF704

ConSeq (DUF72_full)

30

232.4

25

Protein of unknown function DUF72

ConSeq (DUF81_full)

117

245.7

19

Domain of unknown function DUF81

TM segments in query seq.: 12-28, 31-51, 85-105, 112-132, 157-177, 198-218

ConSeq (DUF81_seed)

34

245.7

19

Domain of unknown function DUF81

ConSeq (UPF0004_full)

90

95.9

33

Uncharacterized protein family UPF0004

ConSeq (UPF0004_seed)

35

95.9

33

Uncharacterized protein family UPF0004

ConSeq (UPF0005_full)

86

248.0

21

Uncharacterised protein family UPF0005

TM segments in query seq.: 39-59, 69-89, 98-118, 121-141, 152-172, 176-196, 209-229

ConSeq (UPF0016_full)

45

73.3

37

Uncharacterized protein family UPF0016

TM segments in query seq.: 228-248, 267-287, 299-319

ConSeq (UPF0020_full)

44

188.1

27

Putative RNA methylase family UPF0020

ConSeq (UPF0021_full)

91

200.1

24

PP-loop family UPF0021

ConSeq (UPF0021_seed)

36

200.1

24

PP-loop familyUPF0021

ConSeq (UPF0027_full)

32

394.8

43

Uncharacterized protein family UPF0027

ConSeq (UPF0029_full)

38

112.4

39

Uncharacterized protein family UPF0029

ConSeq (UPF0031_full)

48

239.9

29

Uncharacterized protein family UPF0031

ConSeq (UPF0032_full)

64

205.6

26

MttB family UPF0032

TM segments in query seq.: 24-44, 76-96, 116-136, 157-177, 194-210, 212-221

ConSeq (UPF0033_full)

40

72.3

29

Uncharacterized protein family UPF0033

ConSeq (UPF0040_full)

47

71.3

25

Domain of unknown function UPF0040

ConSeq (UPF0047_full)

30

117.3

35

Uncharacterised protein family UPF0047

ConSeq (UPF0051_full)

85

223.3

31

Uncharacterized protein family UPF0051

ConSeq (UPF0054_full)

44

96.9

38

Uncharacterized protein family UPF0054

ConSeq (UPF0057_full)

34

51.3

41

Uncharacterized protein family UPF0057

TM segments in query seq.: 6-24, 37-57

ConSeq (UPF0066_full)

32

117.7

38

Uncharacterised protein family UPF0066

ConSeq (UPF0073_full)

58

211.6

23

Uncharacterised protein family  (Hly-III) UPF0073

TM segments in query seq.: 29-49, 62-82, 102-122, 125-145, 152-172, 175-195, 199-219

ConSeq (UPF0073_seed)

38

211.6

23

Uncharacterised protein family (Hly-III) UPF0073

ConSeq (UPF0074_full)

54

120.4

28

Uncharacterized protein family UPF0074

ConSeq (UPF0079_full)

40

116.3

35

Uncharacterised P-loop hydrolaseUPF0079

ConSeq (UPF0081_full)

41

133.0

30

Uncharacterised protein family UPF0081

ConSeq (UPF0118_full)

88

323.6

18

Domain of unknown function UPF0118

TM segments in query seq.: 19-39, 72-92, 156-176, 217-237, 240-260, 281-301, 310-330

ConSeq (UPF0118_seed)

32

323.6

18

Domain of unknown function UPF0118

ConSeq (UPF0126_full)

42

86.1

30

domain UPF0126

TM segments in query seq.: 88-108, 112-132, 148-168

ConSeq (UPF0126_seed)

40

86.1

30

domain UPF0126

ConSeq (UPF0153_full)

33

96.4

24

Uncharacterised protein family UPF0153


 

Prediction of functionally and structurally residues in Human proteins of unknown function from the SWISS-PROT database

ConSeq Results

No. Of

Homologues

length

Description (link to SWISS_PROT)

DMWD_HUMAN

100

553

      Dystrophia myotonica (DMWD_HUMAN)

GGE7_HUMAN

11

117

GAGE-7 protein (GGE7_HUMAN)

IFT4_HUMAN

100

490

Interferon-induced protein with tetratricopeptide repeats 4  (IFT4_HUMAN)

LBH2_HUMAN

24

425

Abhydrolase domain containing protein 2 (LBH2_HUMAN)

TM segments in query seq.: 198-218

MAG9_HUMAN

35

315

Melanoma-associated antigen 9 (MAG9_HUMAN)

PSGA_HUMAN

100

424

Pregnancy-specific beta-1-glycoprotein 10 [Precursor] (PSGA_HUMAN)

VCXA_HUMAN

30

186

Variable charge protein on X chromosome with eight repeats (VCXA_HUMAN)

MAG8_HUMAN

28

234

Melanoma-associated antigen 8 (MAG8_HUMAN)

PRPL_HUMAN

100

276

Salivary proline-rich protein PO [Fragment] (PRPL_HUMAN)

BENE_HUMAN

10

153

BENE protein (BENE_HUMAN)TM segments in query seq.: 22-42, 59-79, 97-117, 131-151

NIF1_HUMAN

10

340

Nuclear LIM interactor-interacting factor 1 (NIF1_HUMAN)

FKG2_HUMAN

32

173

Apoptosis inhibitor FKSG2 (FKG2_HUMAN)

CG51_HUMAN

13

469

Protein CGI-51 (CG51_HUMAN)

PINL_HUMAN

32

100

PIN1-like protein (PINL_HUMAN)

RN15_HUMAN

100

465

RING finger protein 15 (RN15_HUMAN)

TLX2_HUMAN

100

284

T-cell leukemia homeobox protein 2 (TLX2_HUMAN)