The ParlaMint corpora of parliamentary proceedings

Institut for Nordiske Studier og Sprogvidenskab

The ParlaMint corpora of parliamentary proceedings

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Dokumenter

Fulltext
Forlagets udgivne version, 2,13 MB, PDF-dokument

Tomaž Erjavec
Maciej Ogrodniczuk
Petya Osenova
Nikola Ljubešic
Kiril Simov
Andrej Pancur
Michał Rudolf
Matyáš Kopp
Starkaður Barkarson
Steinþór Steingrímsson
Çagrı Çöltekin
Jesse de Does
Katrien Depuydt
Tommaso Agnoloni
Giulia Venturi
María Calzada Pérez
Luciana D. de Macedo
Giancarlo Luxardo
Matthew Coole
Paul Rayson
Vaidas Morkevicius
Tomas Krilavicius
Roberts Dargis
Orsolya Ring
Ruben van Heusden
Maarten Marx
Darja Fiser

This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.

Originalsprog	Engelsk
Tidsskrift	Language Resources and Evaluation
Vol/bind	57
Sider (fra-til)	415-448
ISSN	1574-020X
DOI	https://doi.org/10.1007/s10579-021-09574-0
Status	Udgivet - 2023

ID: 291220591

Center for Sprogteknologi

The ParlaMint corpora of parliamentary proceedings

Dokumenter