Research
The CST staff carries out research within a range of language technological fields which interact in various ways. Much of the research is carried out through external projects.
Language research infrastructure
Language resources and the infrastructure around them are important prerequisites for the development of language technology – and for the research that makes use of it. A research infrastructure comprising language data will support digital humanities and thus promote cross-disciplinary collaboration and a broader knowledge sharing in both research and education.
Current (in bold) and former projects:
CLARA - Common Language Resources and their Applications — a Marie Curie ITN.
CLARIN EU - is a European ERIC – European Research Infrastructure Consortium. (In Danish only)
DASISH - Is a data service infrastructure for the social sciences and humanities. (In Danish only)
DIGHUMLAB - Is a Danish research infrastructure initiative in the humanities area. (In Danish only)
DK-CLARIN - was Danish research infrastructure integrating written, spoken, and visual records.
METANORD - was an EU project with aim of developing language resources for the less resourced languages in the Baltic and Nordic countries.
META SHARE - the open language resource exchange facility, is devoted to the sustainable sharing and dissemination of language resources
MULINCO - Multilingual Corpus of the University of Copenhagen (In Danish only).
Language Technology Applications
Computer based applications play a substantial role in several human language technologies and services. At Centre for Language Technology research is performed within the fields of machine translation, automatic question answering, information retrieval, and e-learning.
Current (in bold ) and former projects:
ESICT - Development of an it system answering health related questions. (in Danish only)
Let's MT! - was a EU project about statistic machine translation (in Danish only).
MELFA Mobile e-Learning for Africa. The project was aimed at South Africans with reading difficulties.
MELFO - Mobile elearning for dyslexics (in Danish).
SDMT - Statistical dependency-based machine translation (In Danish only).
Tvärsök – Information retreval between the nordic languages (in Swedish only)
Multimodal Communication
Multimodal communication studies the interaction of different communication modalities – spoken and written language, gestural behavior, body movements and facial expressions. The purpose of this research is to understand the complex nature of human communication and to apply this knowledge to the development of more natural human-computer interfaces.
Current (in bold) and former projects:
Mumin - A nordic network for multimodal interfaces
NOMCO - A collaborative Nordic analysis of multimodal spoken language corpora.
Staging - Interaction about and with virtuel worlds (in Danish only)
VKK – A project about the complex connections between verbal and bodily communication (in Danish only)
Language processing and resources
Research in and development of formal grammars,
computational lexicons, wordnets and annotated corpora for natural language processing of Danish in particular, are central ongoing activities at the Centre. Resources and processing concern several linguistic levels, such as syntax, semantics, and discourse.
Current (in bold) an former projects:
Semantic Processing Across Domains - Semantic processing and domain adaptation including annotation and processing of Danish (in Danish only)
STO – om udviklingen af en sprogteknologisk ordbase for dansk (in Danish only)
DAD - The Abstract Det was a project developing a formal model of the use of Danish pronominal abstract anaphora. (in Danish only)
DanNet was a research- and development project on a Danish lexical semantic wordnet. (in Danish only)
Leksikalsk disambiguering – About a logic-based approach to the handeling of linguistic polysemy (in Danish only)
Machine Learning
Statistical learning techniques can be used to induce
natural language processing models from linguistically annotated text corpora. The development of such techniques is a core research area for CST, and important for several ongoing projects.
Current (in bold) and former projects:
Let's MT! - was a EU project about statistic machine translation (in Danish only).
LOWLANDS - Developing robust learning algorithms for language technology, focusing on languages and domains for which little linguistically annotated data exists.
Semantic Processing across Domains - Semantic processing and domain adaptation including annotation and processing of Danish.
SDMT - Statistical dependency-based machine translation (In Danish only).
VKK – A project about the complex connections between verbal and bodily communication (in Danish only)
What is language technology?
Language technology is an interdisciplinary subject and includes the study of language as well as the study of IT.