FAIR AI through FAIR data in EMBL-EBI’s BioImage Archive

Abstract number: 49

Presentation Form: Poster

DOI: 10.22443/rms.elmi2024.49

Corresponding Email: [email protected]

Session

Poster Session

Authors

Matthew Hartley (1), Aybüke Yoldaş (1), Teresa Zulueta (1)

Affiliations

1. EMBL-EBI

Keywords

AI, data management, open data, FAIR, BioImage Archive, MIFA

Abstract text

Advancements in AI techniques have significantly enhanced our capacity to derive valuable insights from biological images, and our reliance on these techniques will only increase over time. This trend poses twin challenges for the use of image data: understanding the reproducibility and domain of applicability of current models, and second, the development of new ones. These challenges are linked: the process of training, validating, and benchmarking AI models hinges on access to vast quantities of well curated data, but there exists immense potential for the reuse of the data utilized to train current AI models. Realizing this potential requires standardising ways to share and reuse model training data.

The BioImage Archive, EMBL-EBI’s repository for life sciences images, serves as a valuable source of data for AI applications. Recent enhancements to the archive have centered around efforts to render this AI data FAIR (Findable, Accessible, Interoperable, and Reusable), to support effective reuse. To this end, we held a community workshop, resulting in specific guidelines, abbreviated as MIFA, which cover Metadata (essential additional information to support effective annotation use), Incentives (strategies to promote annotation sharing), Formats (preferred annotation file formats), and Accessibility (methods to enhance annotation accessibility).

The BioImage Archive has integrated these guidelines into its deposition system, enabling direct deposition of segmentations and other image annotations utilized in training AI models. These annotations can be linked to related models hosted in the BioImage Model Zoo, a community repository housing ready to use AI models, thereby enhancing the traceability of both models and their training data. The Archive now provides “AI-ready” datasets, a curated collection of data ready for model training. Further accumulation of data through this mechanism will enhance the FAIRness of AI models and foster a robust ecosystem for reuse, as we progress towards training larger and more powerful models in order to unlock future scientific value from our imaging data.

Return to listing