EuBiFlow: A flexible software to create and execute Nextflow-based image processing workflows on OME-Zarr data.

Abstract number
139
Presentation Form
Poster
DOI
10.22443/rms.elmi2024.139
Corresponding Email
[email protected]
Session
Poster Session
Authors
Bugra Özdemir (1)
Affiliations
1. Euro-BioImaging ERIC Bio-Hub, European Molecular Biology Laboratory (EMBL) Heidelberg
Keywords
  • Workflow

  • Python

  • OME-Zarr

  • Nextflow

  • Image Processing

  • Open Science

  • FAIR Data

Abstract text

Summary:

We introduce EuBiFlow, a Python-based high-level workflow manager capable of construction and concurrent execution of image processing pipelines. Its key features include i) operation in a library format, ii) OME-Zarr support, iii) Nextflow integration, iv) template-based extensibility. Additionally, EuBiFlow can be used as a classical Python library, facilitating direct interaction with, and introspection into, individual OME-Zarr objects.

Introduction:

As bioimaging technologies advance, accompanied by a continuous proliferation of relevant bioimage analysis software, the demand for high-throughput image processing solutions grows. In particular, there is a need for tools that enable the user to construct custom workflows tailored to their particular problems, and that can mediate the concurrent execution of those workflows over multiple (and possibly remote) datasets. Such tools hold great potential for efficiently tackling complex image processing tasks.

Here we present EuBiFlow, a versatile workflow manager built on top of Python and Nextflow, that aims to fulfil these goals, while addressing the challenges associated with distributed image processing.

Method:

EuBiFlow can operate as a command line tool or as a Python library. With a user-friendly API, it allows the users to create, import, modify, export, and execute custom workflows by connecting pre-existing task modules (named “tools”).

EuBiFlow leverages the OME-Zarr data model to internally represent the image data as a Python object. When in the library mode, the users can directly interact with EuBiFlow’s OME-Zarr object, which can read, write and modify individual OME-Zarr pyramids. One advantage of adopting the OME-Zarr format in EuBiFlow is the extensive data representation offered by its layout and specification, which inherently support multiple resolution layers as well as label images and certain types of image analysis results. Consequently, as the workflow runs, EuBiFlow reads, updates and writes the OME-Zarr data via this object, which incorporates the modifications made to the data at any step of the workflow and thus maintains the data integrity throughout the job execution. Another important factor is the improved access performance offered by the OME-Zarr format, which allows streaming of data in chunks, and therefore, promises significant performance gains in  workflow execution, which is inherently an IO-intensive operation.

Utilising Nextflow as its workflow orchestrator, EuBiFlow incorporates its capabilities such as tunable concurrency, container support and direct compatibility with HPC clusters. Eventually, EuBiFlow aims to integrate popular image processing libraries and tools, ensure OME-Zarr support for them and enable their execution as Nextflow processes. Regarding the integration of Python libraries, care is taken to leverage Dask-based processing wherever feasible, with a view to scalability and performance when working with large volumes of image data.

Finally, EuBiFlow also offers a user-friendly template-based system for easily wrapping custom Python functions as EuBiFlow tools, expanding its functionality to suit diverse user needs.

Discussion and Conclusion:

EuBiFlow is being developed as an alternative to address the demand for efficient high-throughput image processing tools. With features such as OME-Zarr support and Nextflow integration, EuBiFlow prioritises scalability, performance and data integrity. EuBiFlow’s API is designed to simplify the construction and execution of custom workflows, and to accelerate the processing of large image collections for users. Although still in the early stages of development, EuBiFlow holds promise for becoming a valuable tool with broader adoption within the community.