Virtually Attend FOSDEM 2026

HPC, Big Data & Data Science Track

2026-02-01T09:00:00+01:00

Scientific models are today limited by compute resources, forcing approximations driven by feasibility rather than theory. They consequently miss important physical processes and decision-relevant regional details. Advances in AI-driven supercomputing — specialized tensor accelerators, AI compiler stacks, and novel distributed systems — offer unprecedented computational power. Yet, scientific applications such as ocean models, often written in Fortran, C++, or Julia and built for traditional HPC, remain largely incompatible with these technologies. This gap hampers performance portability and isolates scientific computing from rapid cloud-based innovation for AI workloads.

In this talk we present Reactant.jl, a free and open-source optimising compiler framework for the Julia programming language, based on MLIR and XLA. Reactant.jl preserves high-level semantics (e.g. linear algebra operations), enabling aggressive cross-function, high-level optimisations, and generating efficient code for a variety of backends (CPU, GPU, TPU and more). Furthermore, Reactant.jl combines with Enzyme to provide high-performance multi-backend automatic differentiation.

As a practical demonstration, we will show the integration of Reactant.jl with Oceananigans.jl, a state-of-the-art GPU-based ocean model. We show how the model can be seamlessly retargeted to thousands of distributed TPUs, unlocking orders-of-magnitude increases in throughput. This opens a path for scientific modelling software to take full advantage of next-generation AI and cloud hardware — without rewriting the codebase or sacrificing high-level expressiveness.

2026-02-01T09:30:00+01:00

In the pursuit of reproducible, scalable bioinformatics workflows, tools like Snakemake, Nextflow, and Galaxy have become indispensable. Yet, deploying them on high-performance computing (HPC) systems — where SLURM reigns as the dominant batch scheduler — remains fraught with challenges. This talk recounts the development of the official SLURM plugin for Snakemake (https://doi.org/10.12688/f1000research.29032.3; https://doi.org/10.5281/zenodo.16922261), a journey shaped less by code and more by the idiosyncrasies of HPC environments. From GPU and MPI support to threaded applications, the plugin had to accommodate diverse computational needs — but the real hurdles lay in administrative policies: login nodes off-limits, partition naming chaos, and cluster-specific layouts and policies that defy standardization. I’ll share how the plugin evolved to accommodate the needs of data analysts - from Santa Cruz to Okinawa, from Stellenbosch to Uppsala. Whether you’re a bioinformatician or analyse CERN data, are a workflow developer, or HPC admin, this talk offers a look at the messy, human side of making reproducibility work in real-world HPC landscapes.

2026-02-01T10:00:00+01:00

Wherever research software is developed and used, it needs to be installed, tested in various ways, benchmarked, and set up within complex workflows. Typically, in order to perform such tasks, either individual solutions are implemented - imposing significant restrictions due to the lack of portability - or the necessary steps are performed manually by developers or users, a time-consuming process, highly susceptible to errors. Furthermore, particularly in the field of high-performance computing (HPC), where large amounts of data are processed and the computer systems used are unique worldwide, not only performance, scalability, and efficiency of the applications are important, but so are modern research software engineering (RSE) principles such as reproducibility, reusability, and documentation.

With these challenges and requirements in mind, JUBE [1] (Jülich Benchmarking Environment) has been developed at the Jülich Supercomputing Centre (JSC), enabling automated and transparent scientific workflows. JUBE is a generic, lightweight, configurable environment to run, monitor and analyze application execution in a systematic way. It is a free, open-source software implemented in Python that operates on a "definition-based" paradigm where the “experiment” is described declaratively in a configuration file (XML or YAML). The JUBE engine is responsible for translating this definition into shell scripts, job submission files, and directory structures. Due to its standardized configuration format, it simplifies collaboration and usability of research software. JUBE also complements the Continuous Integration and Continuous Delivery (CI/CD) capabilities, leading to Continuous Benchmarking.

To introduce and facilitate JUBE’s usage, the documentation includes a tutorial with simple and advanced examples, an FAQ page, a description of the command line interface, and a glossary with all accepted keywords [2]. In addition, a dedicated Carpentries course offers an introduction to the JUBE framework [3] (basic knowledge of the Linux shell and either XML or YAML are beneficial when getting started with JUBE). A large variety of scientific codes and standard HPC benchmarks have already been automated using JUBE and are also available open-source [4].

In this presentation, an overview of JUBE will be provided, including its fundamental concepts, current status, and roadmap of future developments (external code contributions are welcome). Additionally, three illustrative use cases will be introduced to offer a comprehensive understanding of JUBE's practical applications: - benchmarking as part of the procurement of JUPITER, Europe’s first exascale supercomputer; - a complex scientific workflow for energy system modelling [5]; - continuous insight into HPC system health by regular execution of applications, and the subsequent graphical presentation of their results.

JUBE is a well-established software, which has already been used in several national and international projects and on numerous and diverse HPC systems [6-13]. Besides being available via EasyBuild [14] and Spack [15], further software has been built up based on JUBE [16,17]. Owing to its broad scope and range of applications, JUBE is likely to be of interest to audiences in the HPC sector, as well as those involved in big data and data science.

[1] https://github.com/FZJ-JSC/JUBE [2] https://apps.fz-juelich.de/jsc/jube/docu/index.html [3] https://carpentries-incubator.github.io/hpc-workflows-jube/ [4] https://github.com/FZJ-JSC/jubench [5] https://elib.dlr.de/196232/1/2023-09_UNSEEN-Compendium.pdf [6] MAX CoE: https://max-centre.eu/impact-outcomes/key-achievements/benchmarking-and-profiling/ [7] RICS2: https://risc2-project.eu/?p=2251 [8] EoCoE: https://www.eocoe.eu/technical-challenges/programming-models/ [9] DEEP: https://deep-projects.eu/modular-supercomputing/software/benchmarking-and-tools/ [10] DEEP-EST: https://cordis.europa.eu/project/id/754304/reporting [11] IO-SEA: https://cordis.europa.eu/project/id/955811/results [12] EPICURE: https://epicure-hpc.eu/wp-content/uploads/2025/07/EPICURE-BEST-PRACTICE-GUIDE-Power-measurements-in-EuroHPC-machines_v1.0.pdf [13] UNSEEN: https://juser.fz-juelich.de/record/1007796/files/UNSEEN_ISC_2023_Poster.pdf [14] EasyBuild: https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/j/JUBE [15] Spack: https://packages.spack.io/package.html?name=jube [16] https://github.com/edf-hpc/unclebench [17] https://dl.acm.org/doi/10.1145/3733723.3733740

2026-02-01T10:30:00+01:00

Content

High-frequency wave simulations in 3D (with e.g. Finite Elements) involve systems with hundreds of millions unknowns (up to 600M in our runs), prompting the use of massively parallel algorithms. In the harmonic regime, we favor Domain Decomposition Methods (DDMs) where local problems are solved in smaller regions (subdomains) and the full solution of the PDE is recovered iteratively. This requires each rank to own a portion of the mesh and to have a view on neighboring partitions (ghost cells or overlaps). In particular, the Optimized Restricted Additive Schwarz algorithm requires assembling matrices at the boundary of overlaps, which requires creating additional elements after the partitioning.

During the last two years, I pushed our in-house FEM code (GmshFEM) to run increasingly large jobs, from 8 MPI ranks on a laptop, through local and national clusters, up to more than 30,000 ranks on LUMI. Each milestone provided its own challenges in the parallel implementation: as the problem size increases, simple global reductions can go from being a minor synchronization to being a major bottleneck, redundant information in partitioned meshes can eat hundreds of gigabytes of RAM, and load-balancing issues can become dominant.

In this talk, I will describe how we tackled these challenges and how the future versions of Gmsh will take into account these issues. In particular, the next version of the MSH file format will be optimized to reduce data duplication across subdomains. I will also present the new API for querying information about partitioned meshes, such as retrieving elements in overlapping regions.

About Gmsh

Gmsh (https://gmsh.info/) is an open-source (GPL-2) finite element mesh generator widely used in scientific and engineering applications. It provides a graphical interface, a scripting language for automation, and language bindings (C/C++, Fortran, Python, Julia). In this work, Gmsh serves as the front-end mesh generator for large-scale distributed FEM simulations using our in-house solver GmshFEM (https://gitlab.onelab.info/gmsh/fem).

2026-02-01T11:00:00+01:00

As the computing needs of the world have grown, the need for parallel systems has grown to match. However, the programming languages used to target those systems have not had the same growth. General parallel programming targeting distributed CPUs and GPUs is frequently locked behind low-level and unfriendly programming languages and frameworks. Programmers must choose between parallel performance with low-level programming or productivity with high-level languages.

Chapel is a programming language for productive parallel programming that scales from laptops to supercomputers. This talk will focus on the ways that Chapel addresses the above gap, giving programmers used to high level languages like Python access to distributed parallel performance. Chapel has long been open-source, but recently moved to become one of the many amazing projects hosted under the High Performance Software Foundation.

The talk will include a description of Chapel and its performance as well as a few examples of Chapel programs. I will also present Arkouda, an exploratory data science tool for massive scales of data. Arkouda is built in Chapel and completely closes the accessibly gap for Python programmers to access supercomputer-scale data analysis.

2026-02-01T11:30:00+01:00

With the rapid acceleration of ML/AI research in the last couple of years, the already energy-hungry HPC platforms have become even more demanding. A major part of this energy consumption is due to users’ workloads and it is only by the participation of end users that it is possible to reduce the overall energy consumption of the platforms. However, most of the HPC platforms do not provide any sort of metrics related to energy consumption, nor the performance metrics out of the box, which in turn do not encourage end users to optimize their workloads.

The Compute Energy & Emissions Monitoring Stack (CEEMS) has been designed to address this issue. CEEMS can report energy consumption and equivalent emissions of user workloads in real time for SLURM (HPC), Openstack (Cloud) and Kubernetes platforms alike. It leverages the Linux perf subsystem and eBPF to monitor the performance metrics of the applications, which can help the end users to identify the bottlenecks in their workflows rapidly and consequently optimize them to reduce the energy and carbon footprint. CEEMS supports eBPF-based continuous profiling and it is the first monitoring stack to support continuous profiling on HPC platforms. Another advantage of CEEMS is that it can systematically monitor all the jobs on the platform without the end users having to modify their workflows or codes.

Besides CPU energy usage, it supports reporting energy usage and performance metrics of workloads on NVIDIA and AMD GPU accelerators. CEEMS has been built around the prominent open-source tools in the observability ecosystem, like Prometheus and Grafana. CEEMS has been designed to be extensible and it allows the HPC center operators to easily define the energy estimation rules of user workloads based on the underlying hardware. CEEMS monitors I/O and network metrics in a file system agnostic manner, allowing it to work on any parallel file system used by HPC platforms. Finally, the talk will conclude by showing how CEEMS monitoring is used on the Jean-Zay HPC platform with more than 2000 nodes that have a daily job churn rate of around 20k jobs.

2026-02-01T12:00:00+01:00

ECMWF manages petabytes of meteorological data critical for weather and climate research. But traditional storage formats pose challenges for machine learning, big-data analytics, and on-demand workflows.

We propose a solution which introduces a Zarr store implementation for creating virtual views of ECMWF’s Fields Database (FDB), enabling users to access GRIB data as if it were a native Zarr dataset. Unlike existing approaches such as VirtualiZarr or Kerchunk, our solution leverages the domain-specific MARS language to define virtual Zarr v3 stores directly from scientific requests, bridging GRIB and Zarr for dynamic, cloud-native access.

This work is developed as part of the WarmWorld Easier project, aiming to make climate and weather data more interoperable and accessible for the scientific community. By combining the efficiency of FDB with the flexibility of Zarr, we unlock new possibilities for HPC, big-data analytics, and machine learning pipelines.

In this talk, we will explore the architecture, discuss performance considerations, and demonstrate how virtual Zarr views accelerate integration in open-source workflows.

This session will: - Explain the motivation behind creating virtual Zarr views of ECMWF’s Fields Database. - Detail the design and implementation of a custom Zarr Store that translates Zarr access patterns into MARS requests. - Discuss performance trade-offs and scalability in HPC contexts. - Showcase real-world examples of how this approach may support data science workflows, machine learning, and distributed computing.

2026-02-01T12:30:00+01:00

Over the last five years, we ran an HPC system for life sciences on top of OpenStack, with a deployment pipeline built from Ansible, manual steps (see FOSDEM 2020 talk). It worked—but it wasn’t something we could easily rebuild from scratch or apply consistently to other parts of our infrastructure.

As we designed our new HPC system (coming online in early 2026), we set ourselves a goal: treat the cluster as something we can declare and then recreate, not pet and nurture. The result is a “zero‑touch” style pipeline where a new node can go from “just racked” to “in SLURM and running jobs” with no manual intervention.

In this talk, we walk through the end‑to‑end workflow:

  • NetBox as DCIM and source of truth: racking a server and adding it to NetBox is the trigger; MACs, serials and IPs are automatically imported from vendor tools and IPAM/DNS into our automation.
  • Using Tofu/Terragrunt (instead of Openstack's Heat orchestration service) to provision OpenStack/Ironic, SLURM infrastructure and network fabric across three environments (dev plus two interchangeable prod clusters for blue/green rollouts).
  • Image‑based deployment with Packer and Ansible: we split roles into “install” and “configure”. Packages and heavy setup are baked into images, while an ansible-init service runs locally on first boot to apply configuration and join the cluster.
  • Making nodes self‑sufficient, including fetching the secrets they need via short‑lived credentials and a minimal external dependency chain.
  • The pitfalls: cloud‑init bugs in non‑standard setups, weirdness with multiple datasources and host types, and how we worked around them.

Come and see how we built a reproducible HPC/Big-Data cluster on open‑source tooling, reusing as much of the stack as possible for the rest of our infrastructure.

About the speakers: Ümit Seren and Leon Schwarzäugl are HPC systems engineers at the Vienna BioCenter home to 3 life science institutes. Over the past years, they helped design, deploy and operate an OpenStack‑based HPC cluster and are now leading the automation and deployment architecture of the new HPC system coming online in 2026. Their interests include bare‑metal automation, reproducible infrastructure, high‑throughput computing and making complex systems easier to operate and debug.

2026-02-01T13:00:00+01:00

Bioinformatics is an interdisciplinary scientific field that deals with large amounts of biological data. The advent of transformer models applied to this field brought very interesting scientific innovations, including the introduction of Protein Language Models (PLMs) and Antibody Language Models (AbLMs). The complexity of training or fine tuning PLMs/AbLMs along with inference tasks requires a non-trivial amount of GPU resources and a disciplined approach, where DevOps and MLOps methodologies fit very well. In this session we will present a series of tasks related to fine tuning PLMs/AbLMs for classification of SARS-CoV-2's spike proteins. We will highlight how Kubernetes can be used to execute large numbers of computationally intensive tasks on GPU hosts, including best practices for sharing Nvidia GPUs (MIG, Time Slicing, MPS) as part of an open source stack orchestrated with Apache Airflow. While these methodologies can be applied to any Kubernetes cluster, including on hyperscalers, this talk is meant to facilitate the (re)use of on-prem hardware infrastructure, presenting a fully open-source stack that can be easily deployed and maintained on bare metal.

Source code: https://github.com/alexpilotti/bbk-mres https://github.com/alexpilotti/bbk-mres-airflow

Overview of the scientific research made possible by this pipeline: https://cloudba.se/NeBzX

2026-02-01T13:10:00+01:00

When you run LLMs or large-scale ML training on HPC clusters, traditional monitoring falls short. GPU utilization at 95% tells you nothing about model quality. Memory bandwidth looks healthy while your inference latency silently degrades. Your job scheduler reports success while concept drift erodes prediction accuracy. This talk introduces a practical observability framework specifically designed for AI workloads on HPC infrastructure, what I call "Cognitive SLIs" (Service Level Indicators for AI systems). I'll cover three critical gaps in current HPC monitoring: 1. Model-aware metrics that matter 2. GPU observability beyond utilization 3. Energy and cost accountability

The demo shows a complete stack built with open source tools: Victoria metrics with custom AI-specific exporters, Grafana dashboards designed for ML engineers (not just sysadmins), and OpenTelemetry instrumentation patterns for PyTorch/JAX workloads.

Attendees will leave with the following resources :

1)Architecture patterns for instrumenting HPC AI workloads 2) Victoria Metrics recording rules and alerting strategies for ML metrics 3)Grafana dashboard templates (GitHub repo provided) 4) Understanding of how AI Act logging requirements intersect with HPC operations

2026-02-01T13:20:00+01:00

JAX is an open-source Python package for high-performance numerical computing. It provides a familiar NumPy style interface but with the advantages of allowing computations to be dispatched to accelerator devices such as graphics and tensor processing units, and supporting transformations to automatically differentiate, vectorize and just-in-time compile functions. While extensively used in machine learning applications, JAX's design also makes it ideal for scientific computing tasks such as simulating numerical models and fitting them to data.

This talk will introduce JAX's interface and computation model, and discuss my experiences in developing two open-source software tools that exploit JAX as a key dependency: S2FFT, a Python package providing Fourier-like transforms for spherical data and Mici, a Python package implementing algorithms for fitting probabilistic models to data. I will also introduce the Python Array API standard and explain how it can be used to write portable code which works across JAX, NumPy and other array backends.

2026-02-01T13:35:00+01:00

Data science tools have come far, with Project Jupyter at the core. But what if we could greatly boost their performance, without leaving the Python ecosystem?

Introducing Zasper, an IDE for Jupyter notebooks build in Go with up to 5× less CPU and 40× less RAM thats also blazingly fast.

2026-02-01T13:45:00+01:00

ROCm™ has been AMD’s software foundation for both high-performance computing (HPC) and AI workloads and continues to support the distinct needs of each domain. As these domains increasingly converge, ROCm™ is evolving into a more modular and flexible platform. Soon, the distribution model shifts to a core SDK with domain-specific add-ons—such as HPC—allowing users to select only the components they need. This reduces unnecessary overhead while maintaining a cohesive and interoperable stack.

To support this modularity, AMD transitions to TheRock, an open-source build system that enables component-level integration, nightly and weekly builds, and streamlined delivery across the ROCm™ stack. TheRock is designed to handle the complexity of building and packaging ROCm™ in a way that’s scalable and transparent for developers. It plays a central role in how ROCm™ is assembled and delivered, especially as the platform moves toward more frequent and flexible release cycles.

In this talk, we’ll cover the entire development and delivery pipeline—from the consolidation into three super-repos to how ROCm™ is built, tested, and shipped. This includes an overview of the development process, the delivery mechanism, TheRock’s implementation, and the testing infrastructure. We’ll also explain how contributors can engage with ROCm™—whether through code, documentation, or domain-specific enhancements—making it easier for developers to help shape the platform.

Online resources TheRock: https://github.com/ROCm/TheRock rocm-libraries: https://github.com/ROCm/rocm-libraries rocm-systems: https://github.com/ROCm/rocm-systems

Most projects are under MIT license.

Speaker JP Lehr, Senior Member of Technical Staff, ROCm™ GPU Compiler, AMD

© 2026 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. LLVM is a trademark of LLVM Foundation. The OpenMP name and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board.

2026-02-01T14:00:00+01:00

The High Performance Software Foundation (HPSF) is a hub for open-source, high performance software with a growing set of member organizations and projects across the US, Europe, and Asia. It aims to advance portable software for diverse hardware by increasing adoption, aiding community growth, and enabling development efforts. It also fosters collaboration through working groups such as Continuous Integration, Benchmarking, and Binary distribution.

This talk will give an overview of HPSF and an update on its latest activities. We’ll talk about new member projects, new member organizations. We’ll give an update on plans for the European HPSF Community Summit 2026 and HPSFCon 2026. We’ll talk about how HPSF is supporting member projects and building collaborations that advance the HPSF community, and we’ll talk about project support and outreach activities.

Find out how you can benefit from joining or collaborating with HPSF, and help to improve the HPC open source world.

2026-02-01T14:30:00+01:00

Are HPC users autonomous? How much flexibility does one have whendeploying software on a supercomputer? How close to one’s laptop development environment is it? How have EasyBuild, Spack, Guix, and Apptainer helped improve the situation in the past decade?

In this talk, I will look at the situation with lucidity. While Spack and EasyBuild enable software deployment by users, their primary user base appears to be HPC system administrators. Thus most HPC admins let users bring their own Singularity/Apptainer images when their needs are not satisfied—effectively “giving up” on complex deployment.

Brave and fearless, the Guix-HPC effort has not given up on the goal of putting reproducible package management in the hands of users, with successes and disappointments. I will report on our experience with Tier-2 supercomputers now providing Guix, and on ongoing work with French national supercomputers (“Tier-1”) as part of NumPEx, the French national program for HPC.

We will look back at the set of challenges overcome in past years—from supporting rootless execution of the build daemon, to making the bring-your-own-MPI approach viable and to enhancing support for CPU micro-architecture optimizations—and those yet to come.

2026-02-01T15:00:00+01:00

Abstract

Spack is a flexible multi-language package manager for HPC, Data Science, and AI, designed to support multiple versions, configurations, and compilers of software on the same system. Since the last FOSDEM, the Spack community has reached a major milestone with the release of Spack v1.0, followed closely by v1.1. This talk will provide a comprehensive overview of the "What's New" in these releases, highlighting the changes that improve robustness, performance, and user experience. We will cover among other things the shift to modeling compilers as dependencies, the package repository split, and the new jobserver-aware parallel installer.

Description

With the release of Spack v1.0 in July 2025 and v1.1 in November 2025, the project has introduced significant architectural changes and new features requested by the community. In this talk, we will dive into the key features introduced across these releases:

  • Compilers as dependencies. Spack has fulfilled an old promise from FOSDEM 2018. Compilers are modeled as first-class dependencies, dependency resolution is more accurate, and binary distribution and ABI compatibility checks are more robust.
  • The separation of the package repository from the core tool and the introduction of a versioned Package API allows users to pin the package repository version independently from Spack itself and enables regular package repository releases.
  • Parallel builds with a new user interface. Spack has a new scheduler that coordinates parallel builds using the POSIX jobserver protocol, allowing efficient resource sharing across all build processes. The decades-old jobserver protocol is experiencing a major renaissance, adopted recently by Ninja v1.13 (July 2025) and the upcoming LLVM 22 release. We’ll talk about how this enables composable parallelism across make, ninja, cargo, GCC, LLVM, Spack, and other tools.

Expected Prior Knowledge / Intended Audience

This talk is aimed at Research Software Engineers (RSEs), HPC system administrators, and Data Scientists who use or manage software stacks. Familiarity with Spack is helpful but not strictly required; the talk will be accessible to anyone interested in package management and software reproducibility in scientific computing.

Links

  • Spack Website: https://spack.io
  • Spack GitHub: https://github.com/spack/spack
  • Spack Packages: https://github.com/spack/spack-packages
2026-02-01T15:30:00+01:00

A few years ago, the European Environment for Scientific Software Installations (EESSI) was introduced at FOSDEM as a pilot project for improving software distribution and deployment everywhere, from HPC environments, to cloud environments or even a personal workstation or a Raspberry Pi . Since then, it has gained wide adoption across dozens of HPC systems in Europe, being installed natively in EuroHPC systems and becoming a component within the EuroHPC Federation Platform.

This session will highlight the progress EESSI has made, including the addition of new CPU and GPU targets, with broader support for modern computing technologies and much more software, featuring 600+ unique software projects (or over 3500 if you count individual Python packages and R libraries that are included) shipped with it. EESSI's capabilities have expanded significantly, turning it into a key service for managing and deploying software across a wide range of infrastructures.

We will provide an overview of the current status of EESSI, focusing on its new capabilities, the integration with tools like Spack and Open OnDemand, as well as its growing software ecosystem. Through a live hands-on demo, we will showcase how EESSI is being used in real-world HPC environments and cloud systems, and discuss the future direction of the platform. Looking ahead, we will cover upcoming features and improvements that will continue to make EESSI a solid enabler for HPC software management in Europe and beyond.

2026-02-01T16:00:00+01:00

GPU vendors provide highly optimized libraries for math operations such as fast Fourier transformation or linear algebra (FFT, (sparse)BLAS/LAPACK, …) to perform those on devices. And OpenMP is a popular, vendor-agnostic method for parallelization on the CPU but increasingly also for offloading calculations to the GPU.

This talk shows how OpenMP can be used to reduce to reduce vendor-specific code, make calling it more convenient, and to combine OpenMP offloading with those libraries. While the presentation illustrates the use with the GNU Compiler Collection (GCC), the feature is a generic feature of OpenMP 5.2, extended in 6.0, and is supported by multiple compilers.

In terms of OpenMP features, the 'interop' directive provides the interoperability support, the 'declare variant' directive with the 'adjust_args' and 'append_args' clauses enable to write neater code; means for memory allocation and memory transfer and running code blocks on the GPU ('target' construct) complete the required feature set.

  • The OpenMP specification, current, past and future version, errata and example documents can be found at https://www.openmp.org/specifications/; a list of compilers and tools for OpenMP is at https://www.openmp.org/resources/openmp-compilers-tools/
  • GCC's OpenMP documentation is available at https://gcc.gnu.org/onlinedocs/libgomp/ (API routines, implementation status, …) and, in particular, the supported interop foreign runtimes are documented at https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html; GCC supports offloading to Nvidia and AMD GPUs. GCC supports OpenMP interop since GCC 15, including most of the OpenMP 6.0 additions, including the Fortran API routines.
2026-02-01T16:30:00+01:00

Evaluating and discussing what makes different types of accelerators interesting for which types of workloads, and the mental model most appropriate for choosing them.

Why it's sometimes a good idea to ignore them all and *just use a CPU, all the way to when FPGAs become interesting as a means of doing more science