The Danish CLARIN project

The University of Copenhagen, heading the Danish CLARIN consortium, has been given a three year grant of 15 million DKK (appr. 2 million €) to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.

The infrastructure was named Centre for Danish Language resources and Technology Infrastructure for the Humanities. The partners included eight leading Danish humanities institutions: four universities, a university library, a museum, and two government research institutions. The ten consortium members were:

University of Copenhagen – KU - with three departments from the Faculty of Humanities:

1) Centre for Language Technology – CST – co-ordinator of the project

2) Danish National Research Foundation Centre for Language Change in Real Time – LANCHART

3) Department of Scandinavian Studies and Linguistics – INSS

4) University of Southern Denmark – SDU

5) University of Aarhus – AU

6) Copenhagen Business School – CBS

7) The Royal Library–KB

8) The National Museum of Denmark– NatMus

9) Society for Danish Language and Literature – DSL

10) Danish Language Council – DSN

Mission and Vision

The vision was to create a researcher’s toolbox by establishing a number of digital Danish text, speech and visual resources and associated tools and to integrate these resources into a web-based environment for research thus creating a much needed support for Danish humanities and enhance its possibilities for European collaboration. The Danish CLARIN project also improved the conditions for Danish language technology research and development by starting a structured approach to a Danish BLARK.

The Danish CLARIN project followed standards and recommendations developed in the preparatory phase of the European CLARIN project, see, but it was important to realize that this project had not been granted as a preparatory phase project in parallel with the European project. It involved an independent Danish investment in the construction of a national infrastructure that stood alone as a vital contribution to the Danish research enterprise. For this reason it was vitally important for the consortium to design the work packages in such a way as to be able to deliver as a result not only the technical infrastructure but also as many types of content as possible.

Work Packages

The work was organized in five thematically defined main work packages, three of which were dedicated to making content available, while one focused on the technical infrastructure.

WP1 Coordination and Technical management

This work package consisted of overall co-ordination and management of the project, and it also addressed subjects such as copyright issues and financing of the deployment phase following the construction.

WP2 Basic written language resources

This work package collected and annotated existing written language resources, contemporary and old, general language and specialised sublanguages, literary and professional, as well as parallel corpora with Danish as one of the languages. WP2 had six sub-work packages, and seven of the ten consortium members collaborated in these in different combinations.

WP3 Spoken language resources and tools

This work package collected and annotated three different spoken language corpora and it developed associated tools. WP3 had three sub-work packages, which involved four of the ten consortium members.

WP4 Technological resources

This work package developed new and modified existing technological resources, which were defined as data resources that were constructed. The work comprised traditional and electronic dictionaries, as well as dictionaries and semantic word nets meant for computer systems. It also comprised the linking between different dictionaries as well as between dictionaries and corpora. WP4 has two sub-work packages in which three of the ten consortium members collaborated.

WP5 Technical infrastructure

The task of this WP was to provide the technical framework for the infrastructure, including a single web user interface to serve as the Danish CLARIN platform. This platform gave access to all the tools and text resources of the infrastructure, as well as a personal workspace, communication facilities, user authentication and rights management, and search and retrieval facilities. WP5 had two sub-work packages to which all of the ten consortium members contributed. The three main technological centres who were responsible for interoperability and functionality of the technical platform were The Royal Library (KB), Society for Danish Language and Literature (DSL), and Centre for Language Technology (CST).