Besides using Computational Linguistics as an auxiliary science for data annotation, the Dutch Language Institute (INT) also conducts research in computational linguistic applications, focusing on inclusive language technologies, machine translation and applications of large language models.
Inclusive language technologies
Inclusive language technologies aim to make digital content and communication accessible to a wide range of users, including those with cognitive, sensory, or linguistic challenges. Our research focuses on text simplification, readability, and the use of easy and plain language. We have developed expertise in augmentative and alternative communication (AAC), particularly through text-to-pictograph translation, and have worked extensively on sign language translation technologies. In addition, we have strong expertise in machine translation and its evaluation.
Application of large language models
We are also well-equipped to work on the training and application of language models. Examples include specializing Dutch language models for part-of-speech tagging in historical Dutch and deriving Dutch meaning structures from running text. Our knowledge also extends to generative large language models. At the institute, we have the expertise to train Large Language Models (LLMs), create relevant datasets and benchmarks, and evaluate the models, all with a clear focus on the Dutch language. We investigate how this powerful technology can fit within running projects and our own infrastructure, asking questions such as: can LLMs generate complex queries in a formal query language? Can they assist lexicographers in writing dictionary definitions and linking multiple data sources? And more generally, how can LLMs be employed to assist researchers in the Social Sciences and Humanities?
We bring the combination of our linguistic and computational expertise to the table in projects and consortia and are always happy to be involved in new research project proposals.
Further information and links:
Project involvement:
- 2025-…: LLMs4SSH CLARIN K-center (https://llms4ssh.clarin-pl.eu/)
Role: supporting researchers in Social Sciences and Humanities in using LLMs - 2024-…: SSHOC-NL (https://sshoc.nl/)
Role: supporting researchers in Social Sciences and Humanities in selecting enrichment tools for their specific use case and training them to evaluate the tools for their use case - 2024-…: GPT-NL (https://gpt-nl.nl/)
Role: involvement as data creator and data provider - 2021-2023: SignON (https://signon-project.eu/)
Role: hosting infrastructure, building datasets, building and evaluating components of a sign language machine translation system
Publications:
- 2025: Duidelijke Taal (proceedings not available yet)
- 2024: Fietje (https://www.clinjournal.org/clinj/article/view/213)
- 2024: GEITje 7B Ultra: A Conversational Model for Dutch (https://arxiv.org/abs/2412.04092)
- 2024: Less is Enough: Less-Resourced Multilingual AMR Parsing (https://aclanthology.org/2024.isa-1.11/)
- 2023: Language Resources for Dutch Large Language Modelling (https://arxiv.org/abs/2312.12852)
Contact persons:
Vincent Vandeghinste (Inclusive NLP and MT) and Bram Vanroy (Large Language Models and evaluation)