Stef van Buuren

Professor Statistical Analysis of Incomplete Data

During my career in both TNO and academia, I have pioneered quantitative algorithms for replacing a missing value with a distribution.

Research area

Missing values are the data we do not see. Missing data may lead us to misunderstand the world, draw incorrect conclusions and make poor decisions. In practice, data are always incomplete. How, then, can we draw valid conclusions?

Well, imagine what the complete data would look like, determine what is missing and why, and then try to recreate the missing data from what we know. Of course, this recreation can never be perfect, so we need to represent these new synthetic data as distributions instead of point values. This approach allows us to evade systematic errors in our judgment if done correctly.

During my career in both TNO and academia, I have pioneered quantitative algorithms for replacing a missing value with a distribution. These methods learn plausible values from the observed data. The Multivariate Imputation by Chained Equations (MICE) algorithm is the de facto standard for completing and analysing data in many fields.

Investigators both within and outside TNO and across many sciences rely on MICE. I apply MICE and related methodologies in many TNO projects, especially in child growth, development, healthy living, and projects for the World Health Organisation and the Bill & Melinda Gates Foundation.

Recent results

  • The Global Scales of Early Development (GSED) project led by the World Health Organisation bases new instruments for measuring child development on the D-score, an innovation by TNO.
  • To ease co-development, we organised development of novel R software in three new GitHub organisations: amicesD-score and growthcharts.
  • The second edition of Flexible Imputation of Missing Data now includes a free and integral online version, including all R code to calculate the results.
  • MICE software is downloaded at a rate of about 60,000 downloads per month. The MICE paper from 2011 approaches 10,000 citations.
  • The Joint Automatic Measurement and Evaluation System (JAMES) webservice handles about one million requests per month.
  • We have separate gateways for D-score work at Gates Open Research
  • The new package brokenstick on CRAN excels at combining, analysing and predicting individual health trajectories.
  • The shinyMice offers interactive diagnostics for missing data imputation. • Highly-cited TNO researcher (5000+ cites per year).

PhD supervision

  • Mingyang Cai (Expected, 2022)
  • Hanne Oberman (New, expected, 2027)
  • Thom Volker (New, expected 2027)

Top publications

Leiden - Schipholweg

Schipholweg 77
2316 ZL Leiden

Postal address

P.O. Box 3005
NL-2301 DA Leiden