首页    期刊浏览 2025年05月06日 星期二
登录注册

文章基本信息

  • 标题:A Genetic Programming Approach for Record Deduplication
  • 本地全文:下载
  • 作者:L.CHITRA DEVI ; S.M.HANSA ; DR.G.N.K.SURESH BABU
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2013
  • 卷号:1
  • 期号:4
  • 出版社:S&S Publications
  • 摘要:In this article we are going to discuss about how genetic programming can be used for record deduplication.Several systems that rely on the integrity of the data in order to offer high quality services, such as digital libraries and ecommercebrokers, may be affected by the existence of duplicates, quasi-replicas, or near-duplicates entries in theirrepositories. Because of that, there has been a huge effort from private and government organizations in developingeffective methods for removing replicas from large data repositories. This is due to the fact that cleaned, replica-freerepositories not only allow the retrieval of higher-quality information but also lead to a more concise data representationand to potential savings in computational time and resources to process this data. In this work, we extend the results of aGP-based approach we proposed to record deduplication by performing a comprehensive set of experiments regarding itsparameterization setup. Our experiments show that some parameter choices can improve the results to up 30%. Thus, theobtained results can be used as guidelines to suggest the most effective way to set up the parameters of our GP-basedapproach to record deduplication.
  • 关键词:Genetic Programming; DBMS; Duplication; Optimisation
国家哲学社会科学文献中心版权所有