Big data, no problem; The IPP a user-friendly tool for dealing with terabyte scale image data on high performance compute clusters

Abstract number
140
Presentation Form
Poster
DOI
10.22443/rms.elmi2024.140
Corresponding Email
[email protected]
Session
Poster Session
Authors
Nicholas Condon (1), Nishanthi Dasanayaka Mudiyanselage (2), Mark Endri (2), James Springfield (1)
Affiliations
1. Institute for Molecular Bioscience, The University of Queensland
2. Research Computing Centre, The University of Queensland
Keywords

Image Processing, High Performance Compute, Big Data, Computing

Abstract text

Taking the image is just the beginning, processing, quantification and visualisation is all accepted as standard requirements for publication, but that can be difficult when dealing with the very large datasets modern light microscopes are producing. We have created the Image Processing Portal (IPP), a web-based intuitive GUI for scalable High-Performance Compute (HPC) processing of image data.

Core facilities are being equipped with camera-based microscopes such as spinning disc confocals, automated timelapse widefields and light-sheet systems that can routinely produce files that exceed 1TB in volume. While storage is becoming cheaper, processing, quantifying and visualising these large datasets is not trivial, and certainly not trained outside of dedicated computer science programs. Universities and large research institutes often have large multi-node, HPC clusters, however unless the user possesses the required command-line and computer science experience these resources are unobtainable to the budding microscopist. Navigating complex code and negotiating with the schedular for the correct and most efficient resource allocation is simplified by the IPP, with wizard-like workflows and metadata reading to populate fields, allowing a user-friendly experience to select data, choose the relevant pipeline, quickly confirm input information and one-click job submission.  Our platform includes file management tools, file converters, batch deconvolution (with multiple engines), custom macro execution and soon HPC-backed tiling and stitching all with easy-to-use menus and interfaces. Linked to parallel filesystems and large on-demand virtual desktops means data processed can be quickly viewed and fed into downstream interactive programs such as Fiji/Napari/Imaris/Arivis for example.

Many workflows for microscopy data processing no doubt benefit from execution on a greater bill of computing materials, but the desktop workstation PC can only realistically contain so much hardware, and with ever tightening budgets new approaches are needed. Actual reductions in processing time of 10-fold (deconvolution) and 15-fold (stitching) have been observed with users of the IPP. Employing automation with scripting the IPP reduces the mundane tasks such as file transfers, opening and saving outputs, meaning that the number of interactive steps required compared to using a workstation PC is reduced to one, building the job and submitting it.

The IPP is an open-source platform being built by our core facility and is available for others to deploy at their institutions to help alleviate the big-data bottlenecks being experienced with light microscopy.