Kernel Track

BLog: High-Performance Per-Component Binary Logging

2026-02-01T09:00:00+01:00

When a kernel component like a storage driver misbehaves in production, developers face a difficult choice. They either have too little information to solve the bug or they enable slow console-level debug logs that ruin performance. This talk introduces a per-component binary logging mechanism designed to support verbose logging in production with negligible run-time cost.

We achieve this efficiency by moving the heavy lifting to build time. using preprocessor macros, we emit parameter serialization stubs and save location-specific formats in a separate side table. At run time, the hot path only records a location ID, a timestamp, and the raw parameters. No format expansion occurs until the logs are read. We support high concurrency using a mostly lock-free multi-level allocator that allows dozens of CPUs to write simultaneously.

We also introduce a significant architectural change by adding a single TLS pointer to struct task_struct. This tags each thread with a private logging buffer. If a thread stalls or deadlocks, the tag remains attached to the buffer. This allows post-mortem analysis to reveal the exact context-specific history of that thread.

Unlike ftrace_printk which dumps everything into a single global ring, our logger maintains one ring per component context. This allows you to capture exactly the data you need for a specific file system or operation. The memory footprint is minimal. Each record is only eight bytes. This saves 16 bytes per entry compared to the standard bprint_entry. This efficiency reduces memory accesses and facilitates a truly production-ready binary logging infrastructure.

We can finally keep verbose logging active at all times. This ensures that when a crash or deadlock occurs, the high-fidelity history needed to solve it is already waiting in memory.

Netboot without throwing a FIT

2026-02-01T09:20:00+01:00

For years, Ahmad’s ideal has been simple: unpack a rootfs on a server, mount it over NFS (or usb9pfs), boot directly into it, and everything just works™.

But as secure boot becomes the default on many embedded systems, squeezing in a network-booted kernel is getting harder and often falls outside the supported boot flow entirely.

Fortunately, some recent improvements in the kernel build system pave the way for a far less invasive netboot setup. This talk gives a quick tour of the key pieces:

The image.fit target for arm64 introduced in v6.10
The modules-cpio-pkg target introduced in v6.19
Initramfs that bind mounts its modules over the rootfs
Optional concatenation of multiple initramfs in the bootloader

In ten minutes, you’ll see how these changes raise the netboot FITness of Linux, so you can keep printk-debugging to your heart’s content.

OF-nodes, Fwnodes, Swnodes, Devlinks, Properties - Understanding How Devices Are Modeled in Linux

2026-02-01T09:40:00+01:00

The linux kernel driver model has grown over the years and acquired several different mechanisms for passing device configuration data to platform drivers. This configuration can come from firmware (device-tree, ACPI) or from the kernel code itself (board-files, MFD, auxiliary drivers).

For a less experienced driver developer, the different APIs that are used to access device properties can be quite confusing and lead to questions: should I use the OF routines? Maybe fwnode or the generic device properties? What are software nodes in this model and what even is a device property? How are devices linked according to their provider-consumer relationship and their probe order ensured, if at all?

This talk will discuss the history and evolution of device properties - from legacy, custom platform data structures, through the introduction of the open-firmware API and its generalization to firmware nodes alongside other fwnode implementations up to the generic device property API. It will also touch on the devlinks and how they tie into this model.

The goal of this beginner/intermediate level talk is to give a clear picture of how device configuration should be handled in the kernel.

Flexible math operations on network packet fields with Nftables

2026-02-01T10:00:00+01:00

A new RFC for Netfilter/nftables arrived recently in the netfilter-devel mailing list [1], introducing flexible math operation support for network packet fields. This could solve some migration problems from iptables to nftables and in addition empower other use-cases.

This demo will quickly show how it works with simple real-world scenarios.

[1] https://lore.kernel.org/netfilter-devel/20250923152452.3618-1-fmancera@suse.de/

Combining Trace(r)s: Kernel ftrace & LTTng UST

2026-02-01T10:20:00+01:00

Tracing complex systems often requires insights from both the kernel and userspace. While tools like Linux's ftrace excel at kernel-level observability and LTTng provides low-overhead userspace tracing, unifying these disparate data sources for a holistic view remains a challenge: using LTTng for kernel tracing requires an out-of-tree kernel module, which can be a barrier for many users.

This talk introduces bt2-ftrace-to-ctf - a new open-source project designed to bridge this gap. Our solution processes a trace.dat file from ftrace (kernel part) and a LTTng UST for userspace, then aligns and rewrites the trace in the Common Trace Format (CTF), as used by LTTng. The resulting output is directly consumable by tools like Trace Compass, enabling comprehensive, synchronized analysis of system behavior across all layers without the need for custom kernel modules.

The project consists of two key components:

A Babeltrace2 plugin: This plugin allows babeltrace2 to directly read trace-cmd's trace.dat files, providing a standardized interface for ftrace data. It includes source and sink components for flexible data handling and metadata emission.
A trace.dat to CTF converter: This utility utilizes the Babeltrace2 plugin to transform ftrace data into an LTTng-alike kernel trace in CTF format. Crucially, it can also combine this kernel trace with an existing LTTng userspace trace, producing a single, unified trace directory.

In the talk, we will give an overview on the tool and discuss challenges during its implementation.

Project: https://github.com/siemens/bt2-ftrace-to-ctf (MIT, LGPL-2.1-or-later)

Reproducible XFS Filesystems - Populating Images Without Mounting

2026-02-01T10:40:00+01:00

Creating filesystem images typically requires mounting, copying files, and hoping your build environment doesn't introduce non-determinism. New capabilities in mkfs.xfs solve both problems. You can now populate an XFS filesystem directly from a directory tree at creation time, no mount required. I'll cover the implementation approach, discuss design, and show how to use it. Useful for distributions, embedded systems, and anyone who needs verifiable filesystem artifacts.

Reference commits: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/commit/?h=for-next&id=8a4ea72724930cfe262ccda03028264e1a81b145

https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/commit/?h=for-next&id=4a54700b4385bbedadfc71ee5bb45b0fc37fabb7

Verification of Linux kernel code

2026-02-01T11:00:00+01:00

Correctness of operating system kernel code is very important. Testing is helpful, but does not always thoroughly uncover all issues. In the Whisper team at Inria, we are exploring the possibility of applying formal verification, using Frama-C, to Linux kernel code. This entails writing specifications, constructing loop invariants, and checking correctness with the support of a SMT solver. This talk will report on the opportunities and challenges encountered.

How to develop and test a PWM driver

2026-02-01T11:20:00+01:00

Most SoCs provide a PWM controller which it used mainly to drive LEDs, a display backlight or a fan. Less often a PWM controls a motor.

The motor use case has a higher demand for exact control of the produced output. In the development cycle for Linux 6.13, the preferred abstraction for a PWM driver changed to be able to fulfill these needs.

After a quick introduction about what a PWM actually does, Uwe (who is also the Linux PWM subsystem maintainer) will present the new API with its requirements and the hardware and software he uses to develop and test a driver.

Update on the SLUB allocator sheaves

2026-02-01T11:40:00+01:00

Sheaves are a new percpu caching layer for the SLUB allocator. To some extent it's a return to the SLAB percpu arrays (and magazines in the original Bonwick's paper), but avoiding the pitfalls of the SLAB implementation, attempting to get the best of both SLAB and SLUB approaches.

In 6.18 sheaves were merged and enabled for maple node and VMA caches. There's ongoing work to fully convert all caches in 7.0. This talk will discuss the status, explain the tradeoffs involved and present some results and lessons learned.

seccomp listeners for nested containers

2026-02-01T12:00:00+01:00

This talk is a follow-up for LPC 2025 "seccomp listeners for nested containers" from "Containers and Checkpoint/Restore" MC [1].

I'll give an update of patch-set progress in LKML, overview of technical challenges. In case if it is merged upstream by the time of this talk at FOSDEM, I'll show a demo of this feature and give a detailed overview of implementation and potential future improvements.

[1] https://lpc.events/event/19/contributions/2241/

TPMs and the Linux Kernel: unlocking a better path to hardware security

2026-02-01T12:30:00+01:00

TPMs have been present in modern laptops and servers for some time now, but their adoption is quite low. While operating systems do provide some security features based on TPMs (think of BitLocker on Windows or dm-verity on Linux) third party applications or libraries usually do not have TPM integrations.

One of the main reasons of low TPM adoption is that interfacing with TPMs is quite hard: there are competing TPM software stacks (Intel vs IBM), lack of key format standardization (currently being worked on) and many operating systems are not set up from the start to make TPM easily available (TPM device file is owned by root or requires privileged group for access). Even with a proper software stack the application may have to deal with low-level TPM communication protocols, which are hard to get right.

In this presentation we will explore a better integration of TPMs with some Linux Kernel subsystems, in particular: kernel keystore and cryptographic API. We will see how it allows the Linux Kernel to expose hardware-based security to third party applications in an easy to use manner by encapsulating the TPM communication complexities as well as providing higher-level use-case based security primitives.

A Modern Look at Secure Boot

2026-02-01T13:00:00+01:00

The basic concept of Secure Boot is now well established (and widely used in Linux for nearly 15) years. Most people now decline to take ownership of their systems (by replacing the CA keys in the UEFI db variable) and instead pivot trust away from the UEFI variables to the MoK (Machine Owner Key) ones instead, which can be updated from the operating system. Thus if you want to secure boot your own kernel, you usually create a signing key and load that into MoK.

This talk will go quickly over the history, how you take ownership, what problems you encounter, how you create and install your own signing keys in MoK and how the kernel imports all these keys into the keyring system and uses them (verifying module signatures and IMA signed policy) including the differences betwen the machine and secondary_trusted keyrings. We'll also discuss some of the more recent innovations, like adding SBAT to the rather problematic UEFI revocation story and how it works.

usermode linux without MMU

2026-02-01T13:30:00+01:00

Usermode Linux (UML) has been developed and maintained in linus tree for decades and well used by kernel developers as an instant way of virtualization within userspace processes, without relying on hypervisor (i.e., qemu/kvm) or software partition (i.e., namespace). Recently unit testing framework for kernel tree, KUnit, bases UML as an underlying infrastructure for the framework, which brings us more opportunities to use UML. However, this testing capability of KUnit+UML is currently limited to MMU-full codebase where there are certain portion of code with ifndef CONFIG_MMU in the kernel tree. As a result, nommu code lacks the chance of testability and often introduces regressions in the rapid development cycle of linux kernel.

This talk introduces yet-another extension to UML, based on the architecture without MMU emulation, in order to exercise nommu code with a plenty of testing framework implemented on KUnit+UML . The kernel is configured with the option CONFIG_MMU=n, and we've implemented a different syscall hook and handling mechanisms with the different interactions to the host processes. With that, existing userspace programs (we've used Alpine Linux image with patched busybox/musl-libc) can run over this UML instance under nommu environment. As a bonus using different implementation to the host interactions, we got speedups in several workloads we've tested including lmbench and netperf/iperf3 benchmarks.

I will briefly overview its implementation, the comparison with the original UML architecture, and share several measurement results obtained during the development. We will also share the upstreaming status which we have been proposed [*1].

*1: https://lore.kernel.org/all/cover.1762588860.git.thehajime@gmail.com/

The limits of ABI stability in the kernel

2026-02-01T14:00:00+01:00

At Chainguard, we want to re-use binary objects across Linux kernel builds of different major versions. For us this is useful for FIPS certification of individual kernel components while still allowing us to build new kernels and not pin the entire kernel forever. To achieve this, we performed a number of experiments with the kernel build system and spoke to other kernel developers about their efforts to achieve the same thing. I will discuss approaches to re-using binary objects, the limits of each, and how the linux kernel could have a stable(r) ABI.

VFS News

2026-02-01T14:30:00+01:00

In this session we're going to take a look at new developments in the VFS layer and related areas.

Reproducing a syzbot Bug in 5 Minutes — Now with virtme-ng!

2026-02-01T15:00:00+01:00

This live demo shows how to pick a real syzbot-reported bug and reproduce it locally in under five minutes using virtme-ng. No disk images, no complex QEMU setup—just build, reproduce and verify the fix. Perfect for anyone who wants to turn kernel fuzzing reports into real patches. Important note: I am going to use pre-built upstream kernel containing a bug due to the talk time constarins. Hovewer, steps to rebuild an upstream kernel and use it in virtme-ng will be described.

Full Description: syzbot continually discovers kernel issues, but reproducing them can be slow or intimidating. In this lightning talk, we’ll use virtme-ng to rebuild a mainline kernel and instantly run a real syzbot reproducer inside an ephemeral VM. We’ll trigger the crash, inspect the backtrace, apply the upstream fix, and rerun the test to verify the resolution—all live. This workflow reduces setup time from hours to minutes and lowers the entry barrier for new contributors. Every attendee will leave knowing how to reproduce syzbot bugs safely and efficiently on their own system.

Live Experiments & Demonstrations:

Select an active syzbot issue (syzbot.appspot.com) and show its reproducer.
Build a mainline kernel and launch it via virtme-run --kdir . --repro repro.c.
Trigger the crash and display kernel backtrace.
Apply the upstream patch or manual fix.
Re-run the reproducer and verify crash disappearance.

Key Points:

Use virtme-ng for instant kernel test environments.
Run real syzbot reproducer without manual QEMU setup.
Observe, patch, and verify kernel bugs live.
Encourage new contributors to validate fuzzing results.
Demonstrate a fully reproducible workflow in < 5 minutes.

What Is Still Missing in System Call Tracing

2026-02-01T15:20:00+01:00

This talk follows last year's presentation "Status and Desiderata for Syscall Tracing and Virtualization Support" and reports on progress and remaining gaps in Linux system call tracing.

The talk presents a set of Linux kernel patches, intended for upstream submission, that address the following limitations and aim to make system call tracing and virtualization more expressive, portable, and efficient.

Over the past year, support for PTRACE_SET_SYSCALL_INFO has been merged into the mainline kernel. While developing a portable version of VUOS across multiple architectures, several limitations of the current tracing interfaces became evident. In particular, skipping a system call by setting its number to -1 is insufficient, as it does not allow the tracer to control the return value or errno, nor to adjust the program counter. As a consequence, the current VUOS proof-of-concept replaces skipped system calls with getpid and fixes up the return value at PTRACE_SYSCALL_INFO_EXIT, doubling the number of context switches and incurring a measurable performance cost. Updating the program counter currently requires non-portable, architecture-specific code using PTRACE_POKEUSER or PTRACE_SETREGSET.

Additional issues arise with seccomp_unotify. Tracing all system calls is difficult because file descriptors must be transferred from the traced task to the tracer; common techniques based on UNIX domain sockets and ancillary messages require sendmsg and recvmsg themselves to be excluded from tracing. Furthermore, there is currently no support for virtualizing the F_DUPFD command of fcntl, nor for allowing a tracer to atomically close a file descriptor in the traced process.

Tuning Embedded Linux for Low Power

2026-02-01T15:40:00+01:00

Power saving has always been a major preoccupation in embedded systems, as by definition, they could have energy constraints. As embedded systems become increasingly pervasive, from IoT devices to industrial controllers, power efficiency is more critical than ever. This talk is aimed at developers, system integrators, and Linux enthusiasts. Whether you’re optimizing a battery-powered board or a power sensitive industrial board, you’ll walk away with practical insights and actionable tools. This talk will explore how to reduce electrical consumption on an embedded Linux system by leveraging software techniques such as kernel low power state (Suspend-To-RAM, Suspend-To-Disk), devices management. We’ll cover how to disable unused peripherals or scale CPU frequencies

Solving Pre-silicon Kernel Upstream for RISC-V First Ever

2026-02-01T16:00:00+01:00

Upstreaming kernel support traditionally happens only after silicon becomes available, but this approach often delays software enablement and ecosystem readiness. For the first time in the RISC-V world, we are tackling the challenge of pre-silicon kernel upstreaming—enabling Linux kernel features ahead of actual chip availability. In this session, we will share the methodology, toolchains, and collaborative workflows that make this possible, including the use of simulation platforms, pre-silicon verification environments, and CI/CD integration for early kernel testing. Attendees will learn how these efforts accelerate software-hardware co-design, reduce bring-up cycles, and ensure that by the time silicon arrives, the kernel is already upstream-ready. This pioneering approach not only shortens time-to-market but also sets a new model for open source hardware-software collaboration in the RISC-V ecosystem. Key Takeaways: - Why pre-silicon kernel upstreaming is a game-changer for RISC-V. - The tools and processes used to validate and upstream before silicon. - Lessons learned and best practices for collaborating with the open source community.

Rich Packet Metadata - The Saga Continues

2026-02-01T16:20:00+01:00

Currently, the only way to attach a piece of information to an network packet (sk_buff) that will travel with it through the Linux network stack is the 32-bit mark field.

Once set, the mark can be read in firewall rules, used to drive routing, and accessed by BPF programs, among other uses. This versatility leads to fierce competition over the mark’s bits. Being just 32 bits wide, it often ends up limiting its practical applications.

That is why in 2024 we embarked on a journey to enable users to attach hundreds of bytes of metadata to network packets - a concept which we call "rich packet metadata" - which unblocks new and exciting applications such as: * Tracing packets through layers of the network stack, even when crossing the kernel-user space barrier. * Metadata-based packet redirection, routing, and socket steering with early packet classification in XDP. * Extraction of information from encapsulation headers and passing it to user space, or vice versa.

While we have made significant progress since the project has started. Our journey ([1], [2]) to let BPF programs and user-space apps attach rich metadata to packets is far from over. In this talk, we'll share what's been done, what's next, what we've learned, and where are the dragons we've yet to slay.

Part I: Upstream Progress and Roadmap

We'll cover:

Why we shifted from the old "skb traits" idea [3] to reusing existing skb metadata.
How bpf_dynptr came to the rescue, and why skb->data_meta still haunts us.
The blockers that keep metadata from traveling cleanly through the Rx path and how we plan to fix them.
Our roadmap for making metadata work on the Tx path.
Ideas for producing and consuming metadata directly in the network stack.

Part II: Lessons from Production

Since our last update [2], we've built several features with packet metadata in Cloudflare's production environment. We'll share hard-earned lessons, including:

Managing metadata contents and optimizing metadata area size.
Using a TLV structure to encode metadata, and how we shuffle it between packet data, metadata area, and maps.
Real-world challenges of reading and writing metadata efficiently.
Passing metadata from packet to socket layers, with full access for TCP via socket options—and our creative hacks for UDP.

Finally, we'll discuss where things still hurt:

Testing headaches and why BPF_PROG_RUN needs love.
What an ideal user API would look like for us.

If you're curious about where packet metadata is headed, or want to help shape the future, this session is for you.

[1] https://lpc.events/event/18/contributions/1935/ [2] https://www.netdevconf.info/0x19/sessions/talk/traits-rich-packet-metadata.html [3] https://lore.kernel.org/all/20250422-afabre-traits-010-rfc2-v2-0-92bcc6b146c9@arthurfabre.com/

Unlocking extra cluster capacity with enhanced Linux cgroup scheduling

2026-02-01T16:40:00+01:00

Cluster orchestrators such as Kubernetes rely on an accurate model of the resources available on each worker node in a cluster and on the resources a given job requires, using this information to place the job onto a suitable worker node in the cluster. If either is inaccurate, the orchestrator will make poor job placement decisions, resulting in poor performance.

I observe that Linux kernel scheduling overheads can, for workloads making heavy use of Linux's group scheduling (cgroups) which include common serverless workloads, become so significant as to make the orchestrator model of worker node resources inaccurate. In practice this effect is mitigated by over-provisioning the cluster.

I propose and evaluate an enhancement to the Linux Completely Fair Scheduler (CFS) that mitigates these effects. By prioritising task completion over strict fairness, the enhanced scheduler is able to drain contended CPU run queues more rapidly and reduce time lost to context switching. Experimental results show that this approach can deliver equivalent performance using up at least 10% fewer worker nodes, significantly improving cluster efficiency.