Below, you’ll find an overview of recent and ongoing projects. You can also have a look at our older projects.
SSHOC-NL (2024-2029)
SSHOC-NL (Social Science and Humanities Open Cloud for the Netherlands) is follow-up project to CLARIAH+ aims to create a consortium of research infrastructures aimed at creating an ecosystem of services, data and tools for the social sciences and humanities. The consortium is led by ODISSEI, the Dutch national infrastructure for social sciences, and CLARIAH, the Dutch national infrastructure for humanities. Within this project, the INT will focus, among other things, on the infrastructure for deploying machine learning and AI for data enrichment. [More information]
Spread the News (2020-2025)
The research project Spread the new(s). Understanding standardisation of Dutch through 17th-century newspapers, funded by NWO Open Competition SSH and conducted at Radboud University and INT, investigates which (socio)linguistic factors determine the functional implementation of a standard language. The INT is providing the technical facilities for this project, including the retrieval and enrichment of a corpus of 17th-century newspapers digitised by volunteers. [More information]
GCND (2020-2024)
The INT is a partner in the Spoken Corpus of Southern Dutch Dialects project, a project running from 2020 to 2024 and is lead by UGent. The project aims to open up a collection of dialect recordings from 768 places in Belgium, France and the southern Netherlands, recorded between 1963 and 1976 . [More information]
Duidelijke Taal (2023-2024)
In co-operation with the Taalunie (Union for the Dutch Language), a pilot project on automatic conversion of documents into simple language will be set up in 2023. The pilot will lead in 2024 to a demo system using state-of-the-art techniques from artificial intelligence.
SignON (2021-2024)
The INT is involved as a consortium partner in the SignON project, funded for three years from spring 2021 within the framework of the European Commission’s Horizon 2020 programme. The main goal of this project is to set up automatic translation services between sign languages and so-called spoken languages. The sign languages at the top of the agenda of this Research and Innovation Action are Flemish Sign Language, Dutch Sign Language, Spanish, British and Irish Sign Language. Spoken languages are Dutch, Spanish, Irish and English. The consortium of this project has a strong Belgian-Dutch component, with as consortium partners from Belgium: VRT, KU Leuven, UGent, Flemish Sign Language Centre and European Union for the Deaf. Participating from the Netherlands are INT, the Taalunie, Radboud University Nijmegen, Tilburg University, and as a third party Beeld en Geluid. The project is led by Dublin City University.
INT’s task consists mainly of setting up the infrastructure for this research, and collecting sign language corpora. This has already resulted in a number of publications and resources being made available, both for VGT and NGT. These efforts will continue in 2023. [More information]
CLARIAH-Vlaanderen (2021-2024)
The INT is involved as a third party in the Flemish research infrastructure project CLARIAH-VL: Advancing the open humanities service infrastructure. INT’s main task is to provide the infrastructure needed to set up the Digital Text Analysis Dashboard & Pipeline. The aim of this infrastructure is to allow researchers from the Digital Humanities to annotate texts automatically, without expecting them to have a technical background, using a cloud-based system where texts can be uploaded.
Via CLARIAH-VL, INT is involved as a data supplier in a Tier-I project at the Flemish Supercomputer Centre to train contextual language models (e.g. SpanBERT) based on the corpora of contemporary Dutch available to INT. [More information]
CLARIAH+ Nederland (2019-2023)
The follow-up project to CLARIAH (Common Lab for Research in the Arts and Humanities) follow-up project will run from 2019 to 2023. Among other things, the INT is involved in improving the infrastructure for historical Dutch, expansion of the corpus search engine BlackLab to parallel corpora and dependency treebanks, tools for making persistent user annotations in corpus search results, a more user-friendly digitisation workflow and the curation of dialect dictionary data. [More information]
Using CoBaLT and GaLAHaD for historical corpus annotation (2023) is as project within CLARIAH where the INT will evaluate CoBaLT, a tool for interactive corpus annotation, the GaLAHaD platform for linguistic annotation of historical Dutch, and various tools for tagging and lemmatising historical texts.
SABeD (2021-2023)
The Spoken Academic Belgian Dutch project, funded by KU Leuven, aims to (1) to compile a corpus of academic spoken Dutch and (2) to investigate the effectiveness of speech technology for automatic transcription of spoken texts, (3) to subsequently develop a word frequency list academic spoken Dutch and (4) a vocabulary test academic spoken Dutch.
The INT is a third party in this application, and will ensure the inclusion of the corpus in the CLARIN infrastructure, both as a download for research and searchable online, in a similar way as is currently the case for the Corpus Spoken Dutch in the OpenSonar application. Furthermore, the INT cooperates in the annotation process (expertise, video data pre-processing, conversion of PDFs and slideshows to text files, etc.) [More Information]
Clasabed (2022-2023) (Clariah.nl tools in SABeD) is a related CLARIAH-NL funded project an involves the evaluation of several tools from the CLARIN infrastructure for their usability for the annotation and analysis of the corpus data of the SABeD project.
ParlaMint II (2021-2023)
ParlaMint is a CLARIN-ERIC-funded project contributing to the creation of comparable and uniformly annotated multilingual corpora of parliamentary sessions. ParlaMint I created and made corpora available for 17 languages. ParlaMint II will improve the XML schema and validation, extend the existing corpora until at least July 2022, add corpora for new languages, further enhance the corpora with additional metadata and improve the usability of the corpora. The INT is responsible for the data of the Belgian federal parliament. [More information]