KMS Of Academy of mathematics and systems sciences, CAS
Prediction of protein-RNA binding sites by a random forest method with combined features | |
Liu, Zhi-Ping1; Wu, Ling-Yun2![]() ![]() ![]() | |
2010-07-01 | |
Source Publication | BIOINFORMATICS
![]() |
ISSN | 1367-4803 |
Volume | 26Issue:13Pages:1616-1622 |
Abstract | Motivation: Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide. Results: In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues. |
DOI | 10.1093/bioinformatics/btq253 |
Language | 英语 |
Funding Project | Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences[2009CSP002] ; National Natural Science Foundation of China[10631070] ; National Natural Science Foundation of China[60873205] ; Ministry of Science and Technology of China[2006CB503905] |
WOS Research Area | Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Computer Science ; Mathematical & Computational Biology ; Mathematics |
WOS Subject | Biochemical Research Methods ; Biotechnology & Applied Microbiology ; Computer Science, Interdisciplinary Applications ; Mathematical & Computational Biology ; Statistics & Probability |
WOS ID | WOS:000278967500006 |
Publisher | OXFORD UNIV PRESS |
Citation statistics | |
Document Type | 期刊论文 |
Identifier | http://ir.amss.ac.cn/handle/2S8OKBNM/9548 |
Collection | 应用数学研究所 |
Corresponding Author | Chen, Luonan |
Affiliation | 1.Chinese Acad Sci, Shanghai Inst Biol Sci, SIBS Novo Nordisk Translat Res Ctr PreDiabet, Key Lab Syst Biol, Shanghai 200031, Peoples R China 2.Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China |
Recommended Citation GB/T 7714 | Liu, Zhi-Ping,Wu, Ling-Yun,Wang, Yong,et al. Prediction of protein-RNA binding sites by a random forest method with combined features[J]. BIOINFORMATICS,2010,26(13):1616-1622. |
APA | Liu, Zhi-Ping,Wu, Ling-Yun,Wang, Yong,Zhang, Xiang-Sun,&Chen, Luonan.(2010).Prediction of protein-RNA binding sites by a random forest method with combined features.BIOINFORMATICS,26(13),1616-1622. |
MLA | Liu, Zhi-Ping,et al."Prediction of protein-RNA binding sites by a random forest method with combined features".BIOINFORMATICS 26.13(2010):1616-1622. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment