首页    期刊浏览 2025年04月19日 星期六
登录注册

文章基本信息

  • 标题:TEXT MINING ALGORITHM DISCOTEX (DIS-COVERY FROM TEXT EXTRACTION) WITH INFORMATION EXTRACTION
  • 本地全文:下载
  • 作者:Dr.T..LALITHA ; S.MEENAKSHI
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2014
  • 卷号:64
  • 期号:2
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Text mining concerns looking for patterns in unstructured text. The related task of Information Extraction (IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DISCOTEX (Discovery from Text EXtraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DISCOTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. In addition, rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of the underlying extraction system. Encouraging results are presented on applying these techniques to a corpus of computer job announcement postings from an Internet newsgroup.
  • 关键词:Knowledge Discovery; Data Mining;Text mining;Information Extraction;Discovered Knowledge
国家哲学社会科学文献中心版权所有