TP7: Data Centric AI and Infrastructures


  • Donato Malerba (Università di Bari)
  • Antonella Poggi (La Sapienza Università di Roma)

Spoke 3: Resilient AI
Spoke 5: High quality AI
Spoke 6: Symbiotic AI
Spoke 9: Green-aware AI

The recent success of AI is certainly due to the possibility of using massive amounts of data. At the same time, the increasingly more powerful learning algorithms used nowadays are much hungrier for data, thus raising the need, on one hand, for a data-centric approach, where the focus is on labeling, managing, slicing, augmenting, and curating the data, and, on the other hand, for data infrastructures that can support storage scaling as the volume of data grows by ensuring the proper storage capacity, IOPS and reliability. The key idea is that data is the primary arbiter of success or failure and is, therefore, the critical focus of iterative development, requiring sufficient computing resources, including GPUs, besides CPUs, to gain power efficiency.

The main objective of TP7 is to coordinate the aspects related to data in AI processes that are addressed in FAIR. In particular, this will be achieved by supporting exchange and interaction among the Spokes involved in the TP, thus encouraging synergies towards a twofold common goal, namely the development of principles and methodologies for a data-centric development of AI, in which research will focus on the data curation part, and the development of tools and solutions for efficient and secure handling of large datasets as input to AI processes.