Time setter story: Jesse van Oort on large language models

Thema:: Artificial intelligence

This is the time when innovation is crucial. To make our world safer, lives healthier and combat climate change. TNO employees make their mark on our time. In this series, we share stories of our time setters. Jesse van Oort is a scientist innovator and data-acquisition lead for GPT-NL at TNO. Since early 2024, he has been working on GPT-NL: a Dutch language model founded on transparency and European values. In doing so, he works closely with SURF and the Netherlands Forensic Institute.

Artificial Intelligence

Artificial intelligence is evolving rapidly from a vague promise to a gamechanger. Large language models such as Open AI can summarise texts, translate between languages, carry out coding, and even rewrite policies. That’s great news for healthcare, government, education, and business. Yet it is also a source of discomfort: almost all major models have been developed by American or Chinese tech companies, which remain tight-lipped about where they get the data from and what happens behind the scenes.

AI has now become so widespread within organisations despite not knowing how the model actually works. ‘Being kept in the dark doesn’t feel right when it comes to critical infrastructure or when you’re working with sensitive data’, says Jesse. ‘That’s why we’re building our own reliable large language model (LLM).’

Watch on youtube

AI helps us analyse faster, design smarter and develop new solutions to some of society’s biggest challenges. But at TNO, it’s not only about what AI can do. It’s also about the questions it raises. How do you harness the power of AI without losing control of your data? How do you innovate while remaining responsible? How do you ensure algorithms are transparent, reliable and explainable? Dilemmas our time setters, innovators of TNO face every day. We asked Saskia Lensink and Jesse van Oort who are working on GPT-NL, a Dutch language model built around transparency, privacy and European values to be used by organisations.

Strengthening autonomy

Jesse explains how an LLM works. ‘At its core, the model predicts which word is likely to follow the words that precede it. The model learns that by processing huge amounts of text. The idea is: if you feed the model enough data, it will eventually know “everything”. But texts contain biases, fake news, errors, copyrighted material, and sometimes personal data. And we don’t want a model that just regurgitates the whole of the internet, but a system that handles data carefully.’

With GPT-NL, the development partners offer an alternative that puts reliability and data security at the centre. ‘We make sure that data stays within organisations as much as possible, while the model simply connects documents and data sources. That way, you benefit from strategic autonomy, increase control, and comply with European regulations such as the AI Act and the GDPR from the get-go. Thanks to GPT-NL, the Netherlands is strengthening its own knowledge, technology, and autonomy in language modelling.’

Responsible digitalisation

Potential customers from the government, healthcare, or education sector can use the model to optimise internal process and tackle problems facing society, for example. ‘The fact that it is not an all-encompassing global model keeps GPT-NL more compact and energy-efficient, which fits in with European ambitions on sustainable and responsible digitalisation.’

That social ambition also aligns with what is personally important to Jesse himself. ‘At TNO, which is not primarily profit-driven, there is room to develop new projects in areas such as sustainable IT and responsible AI. My personal ambition is to make the most positive contribution to society possible. And TNO allows me to do exactly that.’

‘At TNO, there is room to develop new projects in areas such as sustainable IT and responsible AI. My personal ambition is to make the most positive contribution to society possible. And TNO allows me to do exactly that.’

Jesse van Oort

Scientist innovator

Data collection: an intensive process

How do you develop this kind of model in practice? Jesse explains, ‘We started in spring 2024 and thought we would have a working model by the end of that year. But collecting datasets and the legal due diligence surrounding them took much longer than anticipated. Which texts can you use for a fair model? And how can you be sure you’re allowed to use that data? Moreover, a lot of organisations treat their data as valuable private property. Sometimes you have to convince them that the value really lies in the ability to pool that data. By making clear-cut agreements and showing that you don’t just scrape data from the net, you gradually earn their trust.’

To streamline the process, the developers set up a content board with partners from both the public and private sectors. ‘We asked specific questions like: what rights do data owners have? What will they allow the model to be used for? How do we share any revenue?’

Milestone reached

One major milestone reached last year was the conclusion of a collaboration agreement with NDP Nieuwsmedia, the umbrella organisation for almost all major Dutch news media. ‘This gave us instant access to an archive of 25 years of news articles’, Jesse says.

‘High-quality Dutch texts help enormously in building a reliable model. The fact that these parties are willing to share their data with us shows that we can do things differently.’ The team also built a comprehensive curatio n pipeline.

‘All the data has to pass legal checks and go through quality filters and systems that encrypt or delete personal data’, Jesse explains. ‘You never get data that comes out completely clean, but we know where the risks are and have taken appropriate measures for that.’

Ambition

Jesse still continues to work on GPT-NL. Where will the model be in two years’ time? ‘It will be the go-to model for organisations that highly value transparency and trustworthiness. That would mean a great deal for the Dutch AI sector: we get to keep knowledge, infrastructure, and talent within the country, while showing Europe how generative AI can be done: based on diligence, trust, and the public good.’

Want to become a time setter too? Check out vacancies in AI

Skip navigation (Want to become a time setter too? Check out vacancies in AI)

Back to navigation (Want to become a time setter too? Check out vacancies in AI)

Contact us

Skip navigation (Contact us)

Saskia Lensink

Functie:
Consultant & Business Developer
Saskia Lensink works as a consultant and business developer and specializes in language and speech technologies. She applies her knowledge of NLP and ASR in various projects, and is active in a diverse set of consortia and networks to promote sovereign and high-performing European large language models.

More about Saskia
- Standplaats:
  Den Haag - New Babylon
- Email:
  Email Saskia

Back to navigation (Contact us)

Get inspired

99 resultaten, getoond 1 t/m 5

Impact Acceleration Challenge

Informatietype:: Article

The Futureproof AI Impact Acceleration Challenge brought together 100+ experts to answer one question: How can we build AI that is sustainable, sovereign and creates lasting value?

over Impact Acceleration Challenge