ParlaMint: Comparable Corpora of European Parliamentary Data

Institut for Nordiske Studier og Sprogvidenskab

ParlaMint: Comparable Corpora of European Parliamentary Data

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 7,72 MB, PDF-dokument

Tomaž Erjavec
Maciej Ogrodniczuk
Petya Osenova
Petya Petya Osenova
Andrej Pancur
Nikola Ljubešic
Tommaso Agnoloni
StarkaDur Barkarson
María Calzada Pérez
Çagrı Çöltekin
Matthew Coole
Roberts Dargis
Luciana D. de Macedo
Jesse de Does
Katrien Depuydt
Sascha Diwersy
Matyáš Kopp
Tomas Krilavicius
Giancarlo Luxardo
Maarten Marx
Vaidas Morkevicius
Paul Rayson
Orsolya Ring
Michał Rudolf
Kiril Simov
Steinþór Steingrímsson
István Üveges
Ruben van Heusden
Giulia Venturi

This paper outlines the ParlaMint project from the perspective of its goals, tasks, participants, results and applications potential. The project produced language corpora from the sessions of the national parliaments of 17 countries, almost half a billion words in total. The corpora are split into COVID-related subcorpora (from November 2019) and reference corpora (to October 2019). The corpora are uniformly encoded according to the ParlaMint schema with the same Universal Dependencies linguistic annotations. Samples of the corpora and conversion scripts are available from the project’s GitHub repository. The complete corpora are openly available via the CLARIN.SI repository for download, and through the NoSketch Engine and KonText concordancers as well as through the Parlameter4 interface for exploration and analysis.

Originalsprog	Engelsk
Titel	Proceedings of CLARIN Annual Conference 2021
Forlag	CLARIN ERIC
Publikationsdato	2021
Sider	19-24
Status	Udgivet - 2021

Center for Sprogteknologi

ParlaMint: Comparable Corpora of European Parliamentary Data

Dokumenter

Links