期刊名称:International Journal of Signal Processing, Image Processing and Pattern Recognition
印刷版ISSN:2005-4254
出版年度:2008
卷号:1
期号:1
出版社:SERSC
摘要:The main purpose of communication is to transfer information from one corner to another of the world. The information is basically stored in forms of documents or files created on the basis of requirements. So, the randomness of creation and storage makes them unstructured in nature. As a consequence, data retrieval and modification become hard nut to crack. The data, that is required frequently, should maintain certain pattern. Otherwise, problems like retrieving erroneous data or anomalies in modification or time consumption in retrieving process may hike. As every problem has its own solution, these unstructured documents have also given the solution named unstructured document categorization. That means, the collected unstructured documents will be categorized based on some given constraints. This paper is a review which deals with different techniques like text and data mining, genetic algorithm, lexical chaining, binarization method to reach the fulfillment of desired unstructured document categorization appeared in the literature.
关键词:Unstructured Documents; Categorization; Text and Data mining;Genetic Algorithm; Lexical Chaining; Binarization.