CSpace  > 系统科学研究所
Text clustering using frequent itemsets
Zhang, Wen1; Yoshida, Taketoshi3; Tang, Xijin2; Wang, Qing1
2010-07-01
发表期刊KNOWLEDGE-BASED SYSTEMS
ISSN0950-7051
卷号23期号:5页码:379-388
摘要Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we present a review on existing methods of document clustering using frequent patterns. Second, a new method called Maximum Capturing is proposed for document clustering. Maximum Capturing includes two procedures: constructing document clusters and assigning cluster topics. We develop three versions of Maximum Capturing based on three similarity measures. We propose a normalization process based on frequency sensitive competitive learning for Maximum Capturing to merge cluster candidates into predefined number of clusters. Third, experiments are carried out to evaluate the proposed method in comparison with CFWS, CMS, FTC and FIHC methods. Experiment results show that in clustering, Maximum Capturing has better performances than other methods mentioned above. Particularly, Maximum Capturing with representation using individual words and similarity measure using asymmetrical binary similarity achieves the best performance. Moreover, topics produced by Maximum Capturing distinguished clusters from each other and can be used as labels of document clusters. (C) 2010 Elsevier B.V. All rights reserved.
关键词Document clustering Frequent itemsets Maximum capturing Similarity measure Competitive learning
DOI10.1016/j.knosys.2010.01.011
语种英语
资助项目National Natural Science Foundation of China[90718042] ; National Natural Science Foundation of China[60873072] ; National Natural Science Foundation of China[60903050] ; National Hi-Tech RD Plan of China[2007AA010303] ; National Hi-Tech RD Plan of China[2007AA01Z186] ; National Hi-Tech RD Plan of China[2007AA01Z179] ; National Basic Research Program[2007CB310802] ; Foundation of Young Doctors of Institute of Software, Chinese Academy of Sciences[ISCAS2009-DR03]
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000278881300002
出版者ELSEVIER SCIENCE BV
引用统计
文献类型期刊论文
条目标识符http://ir.amss.ac.cn/handle/2S8OKBNM/10220
专题系统科学研究所
通讯作者Zhang, Wen
作者单位1.Chinese Acad Sci, Inst Software, Lab Internet Software Technol, Beijing 100190, Peoples R China
2.Chinese Acad Sci, Inst Syst Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
3.Japan Adv Inst Sci & Technol, Sch Knowledge Sci, Tatsunokuchi, Ishikawa 9231292, Japan
推荐引用方式
GB/T 7714
Zhang, Wen,Yoshida, Taketoshi,Tang, Xijin,et al. Text clustering using frequent itemsets[J]. KNOWLEDGE-BASED SYSTEMS,2010,23(5):379-388.
APA Zhang, Wen,Yoshida, Taketoshi,Tang, Xijin,&Wang, Qing.(2010).Text clustering using frequent itemsets.KNOWLEDGE-BASED SYSTEMS,23(5),379-388.
MLA Zhang, Wen,et al."Text clustering using frequent itemsets".KNOWLEDGE-BASED SYSTEMS 23.5(2010):379-388.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Wen]的文章
[Yoshida, Taketoshi]的文章
[Tang, Xijin]的文章
百度学术
百度学术中相似的文章
[Zhang, Wen]的文章
[Yoshida, Taketoshi]的文章
[Tang, Xijin]的文章
必应学术
必应学术中相似的文章
[Zhang, Wen]的文章
[Yoshida, Taketoshi]的文章
[Tang, Xijin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。