摘要: | 蛋白質數據庫(PDB)中發現了超過140,000種蛋白質結構,其中約三分之一含有金屬離子。因此,探索蛋白質和金屬離子的相互作用是有價值的,鑑定金屬離子結合位點是理解金屬離子結合蛋白的生物學相關性的關鍵。在我們以前的研究中,已成功證明使用局部結構比對來預測12種類型的金屬離子結合位點,包括Ca2+,Cu2+,Fe3+,Mg2+,Mn2+,Zn2+,Cd2+,Fe2+,Ni2+,Hg2+,Co2+,Cu+。然後建立一個網絡服務器,它可以預測金屬離子結合殘留物並對接金屬離子。我們針對這12種金屬離子結合位點的方法的預測性能產生了92.9%至95.1%的準確度,當12種離子中的8種時,當特異性設定為超過95%時,靈敏度大於60%。對於Cu2+,Fe3 +和Fe2+的結合位點,實現了大於85%的預測靈敏度。然而,對於Ca2+,Mg2+,Cd2+和Hg2+的結合位點,預測靈敏度低於50%。因此,我們想通過檢查蛋白質中金屬離子結合位點之間的序列和結構相似性來探索導致預測性能差異的原因。
從蛋白質數據庫(PDB)收集蛋白質 - 金屬離子復合物的結構。通過具有高耐受性同源性(CD-HIT)的Cluster Database篩選所有蛋白質序列,以去除多餘的同源蛋白質。序列會被分為數個群集。從群集間和群集內的每個蛋白質鏈的蛋白質複合物中提取金屬離子結合位點。使用作為局部結構比對方法的片段轉換方法來計算每個成對比對的序列和結構相似性。之後,採用層次聚類方法根據序列和結構相似性對金屬離子結合位點進行分組。最後,通過分析組間的關係,揭示蛋白質和金屬離子相互作用的趨勢。通過對層次聚類分組的觀察,我們可以找出蛋白質特定金屬離子結合位點的序列和結構相似性,使我們能夠更好地理解蛋白質與不同金屬離子之間的相互作用。
More than 140,000 protein structures were found in the Protein Data Bank(PDB) and approximately one-third of them contain metal ions. Therefore, exploring the interaction of protein and metal ions is valuable and identifying metal ion-binding sites is the key to understanding the biological relevance of metal ion-binding proteins. In our previous studies, it had demonstrated successfully that using local structural alignment to predict twelve types of metal ion binding sites, which included Ca2+, Cu2+, Fe3+, Mg2+, Mn2+, Zn2+, Cd2+, Fe2+, Ni2+, Hg2+, Co2+ and Cu+. Then a web server was established which could predict the metal ions binding residues and dock the metal ions. The prediction performance of our approach for these 12 types of metal ion-binding sites is yielded accuracy from 92.9 to 95.1%, and for 8 of the 12 ions the sensitivity was greater than 60% when the specificity was set as more than 95%. For binding sites of Cu2+, Fe3+, and Fe2+, prediction sensitivity greater than 85% was achieved. However, for binding sites of Ca2+, Mg2+, Cd2+ and Hg2+, prediction sensitivity was lower than 50%. Therefore, we would like to explore the reasons that causes the differences in prediction performance by examining the sequence and structure similarity among metal ion binding sites in proteins.
The structures of protein-metal ions complexes were collected from the Protein Data Bank (PDB). All protein sequences were screened by Cluster Database with High Identity with Tolerance (CD-HIT) to remove redundant and homologous protein. The sequences divided into several clusters. The metal ion binding sites were extracted from the protein complexes of each protein chain of inter-cluster and intra-cluster. The fragment transformation method which is a local structural alignment method were used to calculated the sequence and structure similarity for each pairwise alignment. After that, the hierarchical clustering methods were used to group the metal ion binding sites according to the sequence and structure similarity. Finally by analyzing the relationship among groups to reveal the tendency of protein and metal ions interactions. According to the observation of the groups which were divided by hierarchical clustering, we could find out the sequence and structure similarity of specific metal ion binding sites of protein and allow us to better understand the interaction between protein and different metal ions. |