首页    期刊浏览 2025年05月15日 星期四
登录注册

文章基本信息

  • 标题:CORPORA AS DATA SOURCES FOR THE UP-GRADING OF MORPHOLOGICAL TAGGING
  • 本地全文:下载
  • 作者:Klára Osolsobě
  • 期刊名称:Časopis pro Moderní Filologii
  • 印刷版ISSN:0008-7386
  • 电子版ISSN:2336-6591
  • 出版年度:2015
  • 卷号:97
  • 期号:2
  • 页码:136-145
  • 语种:English
  • 出版社:Univerzita Karlova, Filozofická fakulta
  • 摘要:Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.*). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant source of knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar (derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbal adjectives with -oucí/-ící. The link between the inflected and the word-formation variants will be demonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).
  • 关键词:gerund/deverbal adjective;pos tagging;automatic morphological analysis;variant;derivational morphology
  • 其他关键词:verbální adjektivum;morfologické značkování;automatická morfologická analýza;varianta;slovotvorba
国家哲学社会科学文献中心版权所有