CSpace
Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks
Cao, Zhen1,2; Zhang, Shihua1,2,3
2020-03-01
Source PublicationIEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
ISSN1545-5963
Volume17Issue:2Pages:657-667
AbstractGapped k-mers frequency vectors (gkm-fv) has been presented for extracting sequence features. Coupled with support vector machine (gkm-SVM), gkm-fvs have been used to achieve effective sequence-based predictions. However, the huge computation of a large kernel matrix prevents it from using large amount of data. It is unclear how to combine gkm-fvs with other data sources in the context of string kernel. On the other hand, the high dimensionality, colinearity, and sparsity of gkm-fvs hinder the use of many traditional machine learning methods without a kernel trick. Therefore, we proposed a flexible and scalable framework gkm-DNN to achieve feature representation from high-dimensional gkm-fvs using deep neural networks (DNN). We first proposed a more concise version of gkm-fvs, which significantly reduce the dimension of gkm-fvs. Then, we implemented an efficient method to calculate the gkm-fv of a given sequence at the first time. Finally, we adopted a DNN model with gkm-fvs as inputs to achieve efficient feature representation and a prediction task. Here, we took the transcription factor binding site prediction as an illustrative application and applied gkm-DNN onto 467 small and 69 big human ENCODE ChIP-seq datasets to demonstrate its performance and compared it with the state-of-the-art method gkm-SVM.
KeywordDNA Bioinformatics Kernel Feature extraction Support vector machines Genomics Task analysis Bioinformatics machine learning gapped k-mer deep neural network transcription factor binding site prediction
DOI10.1109/TCBB.2018.2868071
Indexed BySCI
Language英语
Funding ProjectNational Natural Science Foundation of China[61621003] ; National Natural Science Foundation of China[11661141019] ; National Natural Science Foundation of China[61422309] ; National Natural Science Foundation of China[61379092] ; Strategic Priority Research Program of the Chinese Academy of Sciences (CAS)[XDB13040600] ; Ten Thousand Talent Program for Young Top-notch Talent ; Key Research Program of the Chinese Academy of Sciences[KFZD-SW-219] ; CAS Frontier Science Research Key Project for Top Young Scientist[QYZDB-SSW-SYS008]
WOS Research AreaBiochemistry & Molecular Biology ; Computer Science ; Mathematics
WOS SubjectBiochemical Research Methods ; Computer Science, Interdisciplinary Applications ; Mathematics, Interdisciplinary Applications ; Statistics & Probability
WOS IDWOS:000524236800025
PublisherIEEE COMPUTER SOC
Citation statistics
Document Type期刊论文
Identifierhttp://ir.amss.ac.cn/handle/2S8OKBNM/51125
Collection中国科学院数学与系统科学研究院
Corresponding AuthorZhang, Shihua
Affiliation1.Chinese Acad Sci, NCMIS, CEMS, RCSDS,Acad Math & Syst Sci, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China
3.Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China
Recommended Citation
GB/T 7714
Cao, Zhen,Zhang, Shihua. Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS,2020,17(2):657-667.
APA Cao, Zhen,&Zhang, Shihua.(2020).Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks.IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS,17(2),657-667.
MLA Cao, Zhen,et al."Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks".IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 17.2(2020):657-667.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Cao, Zhen]'s Articles
[Zhang, Shihua]'s Articles
Baidu academic
Similar articles in Baidu academic
[Cao, Zhen]'s Articles
[Zhang, Shihua]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Cao, Zhen]'s Articles
[Zhang, Shihua]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.