首页    期刊浏览 2025年05月12日 星期一
登录注册

文章基本信息

  • 标题:Grammar Based Pre-Processing for PPM
  • 本地全文:下载
  • 作者:William J. Teahan ; Nojood O. Aljehane
  • 期刊名称:International Journal of Computer Science & Information Technology (IJCSIT)
  • 印刷版ISSN:0975-4660
  • 电子版ISSN:0975-3826
  • 出版年度:2017
  • 卷号:9
  • 期号:1
  • 页码:1
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matching(PPM) compression algorithm. This achieves significantly better compression for different naturallanguage texts compared to other well-known compression methods. Our method first generates a grammarbased on the most common two-character sequences (bigraphs) or three-character sequences (trigraphs) inthe text being compressed and then substitutes these sequences using the respective non-terminal symbolsdefined by the grammar in a pre-processing phase prior to the compression. This leads to significantlyimproved results in compression for various natural languages (a 5% improvement for American English,10% for British English, 29% for Welsh, 10% for Arabic, 3% for Persian and 35% for Chinese). Wedescribe further improvements using a two pass scheme where the grammar-based pre-processing isapplied again in a second pass through the text. We then apply the algorithms to the files in the CalgaryCorpus and also achieve significantly improved results in compression, between 11% and 20%, whencompared with other compression algorithms, including a grammar-based approach, the Sequituralgorithm.
  • 关键词:CFG; Grammar-based;Preprocessing; PPM; Encoding.
国家哲学社会科学文献中心版权所有