Dr Serena Oggero
- Dark Web Solutions
- Cyber Security and Resilience
- National Security
Best-in-class AI algorithms depend on very large amounts of representative training data. This can be up to 100 million items and all too often this amount of data is simply not available. However, small datasets can lead to unreliable outcomes. It is therefore important to develop algorithms that can deal with this.
AI on small datasets
We offer various methods on how to effectively deal with small datasets. These include, transfer learning, online learning, and using high-fidelity models to generate simulated data. All this reduces the need for training data.
Modern machine-learning algorithms have millions of parameters that provide highly predictive values when trained with large datasets. Unfortunately, they perform much worse when trained with small datasets. However, often only small datasets are available as training data. Moreover, obtaining sufficient data is difficult, time-consuming and expensive. Legal and ethical constraints also limit the amount of data. For rare events, it might even be impossible to obtain sufficient data.
Running AI applications on small datasets involve reliability and performance risks. A bias can also occur. This involves numerous challenges:
1. Developing effective algorithms with small datasets that are reliable, unbiased and safe.
2. Combining small datasets with existing model-based approaches.
3. Coping with the issues of missing data and unreliable and changing data sources.
The learning from small and limited data sets technology allows us to leverage the benefits of current Artificial Intelligence developments without needing unaffordable large annotated effort