首页    期刊浏览 2025年05月16日 星期五
登录注册

文章基本信息

  • 标题:Graph Theoretic and Genetic Algorithm-Based Model for Web Content Mining
  • 本地全文:下载
  • 作者:Moses Akinjide Adelola ; Sunday Olumide Adewale ; Gabriel Babatunde Iwasokun
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2016
  • 卷号:13
  • 期号:6
  • 出版社:IJCSI Press
  • 摘要:The World Wide Web (www) is arguably the largest and the most heterogeneous repository of data and has continued to expand in size and complexity. With consistency in expansion, retrieval of required web pages and information has become a herculean task for web users due to information overload and worst still, existing web content retrieval techniques have not exhibited enough efficiency in areas of speed and accuracy. This paper presents a Graph Theoretic (GT) and Genetic Algorithm (GA)-based technique for mining of web documents. The technique utilizes graph representations of document content to address the problems of initialization, convergence to local minimal and failure to handle large datasets. The technique works in three phases; namely contents extraction, preprocessing and database formulation while Maximum Common Sub-graph (MCS) was used to calculate the distance between clusters. Results of the web-based experimental study on Pentium 4 with 2GHz processor and 1GB RAM running on Window 7 operating system platform with web scraper (import.io) as front-end and PHP 6 and MySQL5 as back-ends show the applicability and the superiority of the new techniques over some existing ones.
  • 关键词:Web mining; graph theory; genetic algorithm; knowledge discovery
国家哲学社会科学文献中心版权所有