The benchmark project
The development of a benchmark dataset for evaluating large language models (LLMs) for Danish
The project develops a benchmark dataset for evaluating the intrinsic reasoning capabilities of LLMs that work on Danish. In particular, we base our evaluation datasets on existing semantic dictionaries for Danish, such as The Danish Thesaurus, the wordnet DanNet, The Danish FrameNet Lexicon, The Danish Sentiment Lexicon, and The Central Word Register, and the datasets are partly developed semiautomatically from these.
Partner
The Society for Danish Language and Literature
Dataset
The datasets are available at github and are continuously updated.
Morten Mikkelsen. (30-08-2024). Chatbotter skal også forstå sprogets danske sjæl. Kristeligt Dagblad. Interview with Bolette S. Pedersen and Nathalie Hau Sørensen.
Pedersen, B. S., Sørensen, N. C. H., Olsen, S., & Nimb, S. (2024). Evaluering af sprogforståelsen i danske sprogmodeller – med udgangspunkt i semantiske ordbøger. NyS - Nydanske Sprogstudier, 65, 8-40. [1].
Pedersen, B. S., Sørensen, N. C. H., Olsen, S., Nimb, S., & Gray, S. (2024). Towards a Danish Semantic Reasoning Benchmark - Compiled from Lexical-Semantic Resources for Assessing Selected Language Understanding Capabilities of Large Language Models. I Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (s. 16356). ELRA and ICCL.
- Workshop on NLU benchmark datasets for Danish, Centre for Language Technology, March 12, 2024.
- Benchmarking Workshop, Agency of Digital Government, September 20, 2024
Participants
Internal
Name | Title | Phone | |
---|---|---|---|
Bolette Sandford Pedersen | Professor, Deputy Head of Department | +4535329078 | |
Dorte Haltrup Hansen | Academic Research Staff | +4535329070 | |
Nina Skovgaard Schneidermann | Research Assistant | +4535331600 | |
Simon Gray | Academic Research Officer | +4535337688 | |
Sussi Olsen | Academic Research Staff | +4535329064 |
Funding
Title: Compiling a Danish Benchmark Dataset for Assessing Selected Reasoning Capabilities of Large Language Models
Project period: 1 February 2024 – 1 February 2026