LLVM Track

2026-01-31T15:00:00+01:00

A word of welcome by the LLVM Dev room organizers.

Experimenting with the AArch64 Pointer Authentication (PAuth) ABI on bare-metal.

2026-01-31T15:05:00+01:00

LLVM has recently gained support for an ELF implementation of the AArch64 Pointer Authentication ABI (PAuthABI) for a Linux Musl target. This talk will cover: * An introduction to the PAuthABI and its LLVM support. * How to experiment with it on any Linux machine using qemu-aarch64 emulation. * How to adapt the Linux Musl target to a bare-metal target using LLVM libc.

The AArch64 Pointer Authentication Code instructions are currently deployed on Linux to protect the return address on hardware that has support for it. This limited use case can be deployed in an ABI neutral way and run on existing hardware. The PAuthABI, based on Apple's Arm64E, takes the hardware and software backwards compatibility gloves off, and makes use of pointer authentication for code pointers such as function pointers and vtables.

The main challenge to adapt the PAuthABI support for bare-metal is initialization of global pointers as on Linux this is done by the dynamic loader. We will need to build our own signer that operates before main.

Hotpatching ClickHouse in production with XRay

2026-01-31T15:25:00+01:00

Ever been debugging a production issue and wished you'd added just one more log statement? Now you have to rebuild, wait for CI, deploy... all that time wasted. We've all been there, cursing our past selves.

We've integrated LLVM's XRay into ClickHouse to solve this. It lets us hot-patch running production systems to inject logging, profiling, and even deliberate delays into any function. No rebuild required.

XRay reserves space at function entry/exit that can be atomically patched with custom handlers at runtime. We built three handler types: LOG to add the trace points you forgot, SLEEP to reproduce (or prevent) timing-sensitive bugs, and PROFILE for deterministic profiling to complement our existing sampling profiler. The performance overhead when inactive is negligible.

Control is simple. Send a SQL query as SYSTEM INSTRUMENT ADD LOG 'QueryMetricLog::startQuery' 'This message will be logged at the start of the function' to patch the function instantly. Results show up in system.trace_log. Remove it just as easily when you're done.

I'll cover the integration challenges (ELF parsing, thread-safety, atomic patching), performance numbers (4-7% binary size, near-zero runtime cost), and real production war stories.

GPU Offloading in LLVM: Architecture, API, and Plugins

2026-01-31T15:50:00+01:00

Over the past two years, the LLVM community has been building a general-purpose GPU offloading library. While still in its early stages, this library aims to provide a unified interface for launching kernels across different GPU vendors. The long-term vision is to enable diverse projects—ranging from OpenMP® to SYCL™ and beyond—to leverage a common GPU offloading infrastructure.

Developing this new library alongside the existing OpenMP® offloading infrastructure has introduced several interesting challenges, as both share the same plugin system. This is particularly evident in the implementation of the OpenMP® Tools Interface (OMPT).

In this talk, we’ll explore the journey so far: • Project history – how the effort started and evolved. • Current architecture – the organization of the offloading library today. • API design – what the interface looks like and how it works. • Plugins – the lower-level components that make vendor-specific integration possible. • Challenges – issues encountered in the current OMPT implementation.

OrcJIT at Scale with the llvm-autojit Plugin

2026-01-31T16:15:00+01:00

LLVM’s ORC JIT [1] is a powerful framework for just-in-time compilation of LLVM IR. However, when applied to large codebases, ORC often exhibits a surprisingly high front-load ratio: we have to parse all IR modules before execution even reaches main(). This diminishes the benefits of JITing and contributes to phenomena as the “time to first plot” latency in Julia, one of ORC’s large-scale users [2].

The llvm-autojit plugin [3] is a new experimental compiler extension for automatic just-in-time compilation with ORC. The project reached a proof-of-concept state, where basic C, C++ and Rust programs build and run successfully. It integrates easily with build systems like CMake, make and cargo, making it practical to apply to real-world projects.

In this talk, we will examine the front-loading issue in ORC and explain how llvm-autojit mitigates it. Attendees will learn about pass plugins, LLVM IR code transformations, callgraphs and runtime libraries. And they will see how to experiment with ORC-based JITing in their own projects.

[1] https://llvm.org/docs/ORCv2.html [2] https://discourse.julialang.org/t/time-to-first-plot-clarification/58534 [3] https://github.com/weliveindetail/llvm-autojit

Generating Programmable NPUs from Linalg with MLIR and CIRCT

2026-01-31T16:40:00+01:00

Every new AI workload seems to need new hardware. Companies spend months designing NPUs (neural processing units), then more months building compilers for them—only to discover the hardware doesn't efficiently run their target workloads. By the time they iterate, the algorithm has moved on.

We present a work-in-progress approach that generates NPU hardware directly from algorithm specifications using MLIR and CIRCT. Starting from a computation expressed in MLIR's Linalg dialect, our toolchain automatically generates synthesizable SystemVerilog for custom NPU architectures and hooks it up automatically to a RISC-V control host with an optimized memory hierarchy.

This "algorithm-first" hardware generation inverts the traditional flow: instead of designing hardware then hoping the compiler can use it effectively, we generate hardware that is provably optimal for specific Linalg operations. The approach enables rapid exploration of the hardware/algorithm co-design space: change the algorithm, regenerate the hardware, and immediately see the impact on area, power, and performance. In this talk, we'll demonstrate: * Live generation of NPU RTL from Linalg operations * The MLIR dialect stack that bridges high-level algorithms to CIRCT hardware representations * Performance comparisons between generated hardware and handmade open-source NPUs * Open questions around generalization vs. specialization trade-offs

This work aims to make hardware generation accessible to compiler engineers and algorithm researchers, not just hardware designers. We'll discuss both the potential and limitations of this approach, and where the research needs to go next.

Target audience: Compiler engineers, hardware architects, ML systems researchers. Basic familiarity with MLIR helpful but not required.

WebAssembly Debugging with LLDB

2026-01-31T17:05:00+01:00

WebAssembly support in Swift started as a community project and became an official part of Swift 6.2. As Swift on WebAssembly matures, developers need robust debugging tools to match. This talk presents our work adding native debugging support for Swift targeting Wasm in LLDB. WebAssembly has some unique characteristics, such as its segmented memory address space, and we'll explore how we made that work with LLDB's architecture. Additionally, we'll cover how extensions to the GDB remote protocol enable debugging across various Wasm runtimes, including the WebAssembly Micro Runtime (WAMR), JavaScriptCore (JSC), and WasmKit.

llvm-mingw

2026-01-31T17:30:00+01:00

llvm-mingw is a mingw toolchain (freely redistributable toolchain targeting Windows), built entirely with LLVM components instead of their GNU counterparts, intended to work as a drop-in replacement for existing GNU based mingw toolchains. Initially, the project mainly aimed at targeting Windows on ARM, but the toolchain supports all of i686, x86_64, armv7 and aarch64, and has been getting use also for projects that don't target ARM.

In this talk I describe how the project got started, and how I made a working toolchain for Windows on ARM64 before that even existed publicly.

https://github.com/mstorsjo/llvm-mingw/

Building Interactive C/C++ workflows in Jupyter through clang-repl

2026-01-31T17:55:00+01:00

C++ remains central to high-performance and scientific computing, yet interactive workflows for the language have historically been fragmented or unavailable. Developers rely on REPL-driven exploration, rapid iteration, rich visualisation, and debugging, but C++ lacked incremental execution, notebook integration, browser-based execution, and JIT debugging. With the introduction of clang-repl, LLVM now provides an upstream incremental compilation engine built on Clang, the IncrementalParser, and the ORC JIT.

This talk presents how the Project Jupyter, Clang/clang-repl, and Emscripten communities collaborated to build a complete, upstream-aligned interactive C++ environment. Xeus-Cpp embeds clang-repl as a native C/C++ Jupyter kernel across Linux, macOS, and Windows, enabling widgets, plots, inline documentation, and even CUDA/OpenMP use cases. Xeus-Cpp-Lite extends this model to the browser via WebAssembly and JupyterLite, compiling LLVM and Clang to WASM and using wasm-ld to dynamically link shared wasm modules generated per cell at runtime.

To complete the workflow, Xeus-Cpp integrates LLDB-DAP through clang-repl’s out-of-process execution model, enabling breakpoints, stepping, variable inspection, and full debugging of JIT-generated code directly in JupyterLab.

The talk will detail how clang-repl, ORC JIT, wasm-ld, LLDB, and LLDB-DAP come together to deliver a modern, sustainable interactive C++ workflow on both desktop and browser platforms, with live demonstrations of native and WebAssembly execution along the way.

LLVM Components Involved : clang, clang-repl, orc jit, wasm-ld, lldb, lldb-dap.

Target Audience : Researchers, Educators, Students, C/C++ Practitioners

Note : Please make sure to check out the demos/links added to the Resource section. These demos would be shown live in the talk.

(clang-)Tidying up includes in systemd

2026-01-31T18:20:00+01:00

This year, systemd had a breakup with its bad practice of including unused headers all over the codebase. This resulted in:

A 33% speedup in from scratch build times
A 50% reduction in runtime for our build test CI jobs
Thousands of lines of code removed from the codebase

I'll present how I went about this work, using clang-include-cleaner, clang-tidy and ClangBuildAnalyzer, and including the challenges I faced:

A scalable way to organize source to minimize unused headers
Macros
Different build configurations which change the used headers in a source file due to #ifdef conditionals
Missing features in clang-tidy and clang-include-cleaner (and my contributions to LLVM to implement those)

https://github.com/systemd/systemd https://github.com/llvm/llvm-project github.com/aras-p/ClangBuildAnalyzer

Zero-sysroot hermetic LLVM cross-compilation using Bazel

2026-01-31T18:45:00+01:00

Cross-compiling C and C++ is still a tedious process. It usually involves carefully crafted sysroots, Docker images and specific CI machine setups. The process becomes even more complex when supporting multiple libcs and libc versions, or architectures whose sysroots are hard or impossible to generate.

In this talk, we present toolchains_llvm_bootstrapped, an open-source Bazel module that replaces sysroots with a fully hermetic, self-bootstrapping C/C++ cross-compilation toolchain based on LLVM.

We dive into how the project wires together three Bazel toolchains: * A raw LLVM toolchain based on prebuilt LLVM binaries that cross-compiles all target runtimes from source: CRT objects, libc (glibc or musl), libstdc++/libc++, libunwind, compiler-rt, etc. * A runtime-enabled toolchain that uses those freshly built runtimes to hermetically compile your application code. * An optional self-hosted toolchain used to build LLVM entirely from source (pre-release, patched, or local branches), which is then used for the two previous stages; all in a single Bazel invocation.

We also showcase unique use cases enabled by this approach: * Cross-compiling to any target, entirely from source, with little to no configuration. * Whole-program sanitizer setups that are almost impossible with prebuilt sysroots. * Targeting arbitrary versions of the glibc. * Setup-free remote execution for cross compilation tasks. * Applying patches to LLVM, building a new toolchain and testing it against real-world projects, without manual bootstrapping steps.

Project source code: https://github.com/cerisier/toolchains_llvm_bootstrapped