CSpace
关联规则挖掘的取样误差量化模型和快速估计算法
Alternative TitleQuantitative Model and Fast Estimation Algorithm of Sampling Error for Association Rule Mining
贾彩燕1; 陆汝钤2
2006-01-01
Source Publication计算机学报
ISSN0254-4164
Volume29.0Issue:004Pages:625-634
Abstract在关联规则挖掘过程中,现有的取样误差量化方法和快速估计算法存在着不足,对此提出了一种新的取样误差量化三元组模型,并在实验观察和理论分析的基础上给出了一种取样误差的快速估计算法———主误差区间估计法.理论分析和实验结果均表明,此方法不但可以精确、有效地度量出样本集与原始数据集包含的频繁模式信息间的差异,而且,主误差区间估计法还可以精确、快速地估计出取样误差,并能灵活地嵌入到关联规则挖掘的各种取样方法之中;其核心思想还可以用于改进分布、并行关联规则挖掘方法的效率.
Other AbstractSampling is a simple and effective technique to improve the efficiency and the scalability of algorithms for association rule mining. However, there is lack of necessary research to define the degree of error with respect to the outcome of the algorithm, i. e. , a quantitative model to measure the sampling error, and to estimate the error efficiently. In this paper, based on the systematic analysis, the authors point out the deficiency of the current results in this field and give a novel, flexible quantitative model to measure the sampling error, and propose a high efficient computational method, interval estimation algorithm of cardinal error, for estimating sampling error based on the real observation and the theoretical analysis. Both of theoretical analysis and realistic experiments show the error between sample set and original dataset can be obtained effectively and accurately by the model, the representative capability of sample set to original dataset also can be estimated efficiently and exactly by the interval estimation algorithm of cardinal error. What's more, the interval estimation algorithm of cardinal error can be conveniently nested into sampling algorithms to speed up them and is useful for distributed, parallel association rule mining algorithms.
Keyword关联规则 频繁项集 取样误差 主误差 PAC学习
Indexed ByCSCD
Language中文
CSCD IDCSCD:2357772
Citation statistics
Document Type期刊论文
Identifierhttp://ir.amss.ac.cn/handle/2S8OKBNM/53562
Collection中国科学院数学与系统科学研究院
Affiliation1.北京交通大学
2.中国科学院数学与系统科学研究院
Recommended Citation
GB/T 7714
贾彩燕,陆汝钤. 关联规则挖掘的取样误差量化模型和快速估计算法[J]. 计算机学报,2006,29.0(004):625-634.
APA 贾彩燕,&陆汝钤.(2006).关联规则挖掘的取样误差量化模型和快速估计算法.计算机学报,29.0(004),625-634.
MLA 贾彩燕,et al."关联规则挖掘的取样误差量化模型和快速估计算法".计算机学报 29.0.004(2006):625-634.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[贾彩燕]'s Articles
[陆汝钤]'s Articles
Baidu academic
Similar articles in Baidu academic
[贾彩燕]'s Articles
[陆汝钤]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[贾彩燕]'s Articles
[陆汝钤]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.