期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2014
卷号:3
期号:11
页码:3633-3640
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Authorship Attribution (AA) deals with identify the author of an anonymous text from known author set. The Authorship Attribution problem is can be viewed as a classification problem. The different steps involved in Authorship Attribution are data preprocessing for vector representation of the text, feature extraction for quantitative representation of the text, feature selection is to reduce the dimensionality feature space, classification algorithms for pattern generation and finally author identification for the given unknown document. There are four categories of features such as lexical, character, syntactic, and semantic features. In this paper character level features and lexical features are considered for feature extraction. Dimensionality of the feature space is reduced using chi-square measure. Classifiers such as Naive Bayes, K-Nearest Neighbour, Support Vector Machine and Decision Tree are used to learn the training document set and to identify the author of a unknown text. The performance of these classifiers in combination with character and lexical features in the context of AA is empirically evaluated on Telugu Texts.
关键词:Authorship attribution; Text preprocessing; ; Stemming; Feature extraction and Machine learning classifier