Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

  • Christian Rishøj
  • Taneth Ruangrajitpakorn
  • Prachya Boonkwan
  • Thepchai Supnithi
A method for deriving an approximately labeled dependency treebank from the Thai Categorial Grammar Treebank has been implemented. The method involves a lexical dictionary for assigning dependency directions to the CG types associated with the grammatical entities in the CG bank, falling back on a generic mapping of CG types in case of unknown words. Currently, all but a handful of the trees in the Thai CG bank can unambiguously be transformed into directed dependency trees. Dependency labels can optionally be assigned with a learned classifier, which in a preliminary evaluation with a very small training set achieves 76.5% label accuracy. In the process, a number of annotation errors in the CG bank were identified and corrected. Although rather limited in its coverage, excluding e.g. long-distance dependencies, topicalisations and longer sentences, the resulting treebank is believed to be sound in terms of structural annotational consistency and a valuable complement to the scarce Thai language resources in existence.
OriginalsprogEngelsk
TitelProceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP)
ForlagAssociation for Computational Linguistics
Publikationsdatonov. 2011
StatusUdgivet - nov. 2011

ID: 34349668