OpenCloud has the design goal to not use a relational database. This requires a deeper integration with the underlying storage system, ie. through extensive use of extended file attributes. Since features like file revisions, trash and shares are inevitable nowadays, OpenCloud makes use of SDS native supported storage aspects to build these advanced features in an efficient way.
In this talk we will give an overview of the storage aspects that are relevant from OpenClouds perspective, the integrations that we currently support as well as ongoing research topics.
Ceph storage: Enterprise meets Community. Our traditional Ceph storage roadmap session starts with everything that is happening in the upstream project this year and what we have planned for the future, and closes with the state of what is backed by vendor-supported products. A 360-degree look at the state of Ceph integration with OpenStack and what is planned going forward in the broader storage space, in particular in regards to features relevant to container workloads.
Architectural familiarity with Ceph is required. This session contains zero vendor pitches, and it is a caffeinated tour of what the Ceph community is working on at the feature level. Hang on to your hats, and bring questions!
Garage (project website) is a versatile object storage software, focused on decentralized and geo-distributed deployments. The software has been developed under the AGPL for more than 5 years and is now reaching maturity.
This talk will cover development and new features of the 2.x releases since the last FOSDEM talk (2024), best practices for administrators, available UIs, and a small tutorial on how to migrate from minio.
The CernVM File System (CVMFS) is a scalable, high-performance distributed filesystem developed at CERN to efficiently deliver software and static data across global computing infrastructures, primarily designed for high-energy physics (HEP). For the Large Hadron Collider (LHC) only, CVMFS is serving around 4 billion files (~2PB of data). CVMFS uses a content-addressable storage model, where files are stored in the form of cryptographic hashes, ensuring integrity and enabling deduplication. It follows a multi-caching architecture where the data are published in a single source of truth (Stratum 0), mirrored by a network of distributed servers (Stratum 1), and propagated to the clients via forward proxies. This multi-layer of caching allows for a cost-effective alternative to traditional file systems, where clients are offered reliable access to versioned read-only datasets with low overhead. In this talk we will focus on how CVMFS interoperates with the highly adopted S3 storage, providing a conventional POSIX filesystem view of the objects, using the available metadata for efficient exploitation of the medium. We will also highlight the benefit of using CVMFS with containerized workflows and demonstrate tools developed to facilitate data publishing.
Homepage: https://cernvm.web.cern.ch/fs/
Documentation: https://cvmfs.readthedocs.io/
Development: https://github.com/cvmfs/cvmfs/
Forum: https://cernvm-forum.cern.ch/
With cost and performance requirements becoming more and more relevant in today’s storage products, technologies that leverage algorithmic driven improvements are getting a lot of attention. Erasure coding is the most prominent algorithm and a meanwhile well established standard for saving on-disk space requirements in storage. It is built upon mathematical techniques. In my talk I want to explain and explore these techniques, and thereby the mathematical reasoning underlying these algorithms in a way that does not require a background in mathematics itself (or at least only an insignificant amount). I am not a software engineer myself, just an interested mathematics student who aims to introduce someone who is interested and not too fond of maths to the underlying theory of erasure coding.
Have you ever found your CephFS setup mysteriously broken and had no clue how it got there? Maybe someone ran a CLI command in haste, or a misstep happened weeks ago. We have suspicions, but can’t really recall what might've splintered the system. That changes now.
In this talk, we introduce a robust command history logging mechanism for CephFS: a persistent log of CephFS commands and standalone tool invocations, backed by LibCephSQLite. Think of it as “shell history,” but purpose-built for Ceph with time ranges, filters, and structured metadata. Every ceph fs subvolume rm, every ceph config set, every mischievous --force — now recorded, timestamped, and queryable.
Want to know what was run last Tuesday at 3 AM? Or who triggered that well-intentioned-but-catastrophic disaster recovery script? Or just list the last 100 commands before things exploded? It’s all there. This helps debug incidents faster, provides a clear audit trail, and opens the door to proactive traceability. So, when things go sideways around CephFS and no one's sure why — this history has your back.
This is CephFS-first but not CephFS-only. The path to full cluster command traceability starts here.
Starting with the Tentacle release, Ceph introduces mgmt-gateway: a modular, nginx-based service that provides a secure, highly available entry point to the entire management and monitoring stack. This talk will cover its architecture and deployment, how it centralizes access to the dashboard and observability tools, and how OIDC-based Single Sign-On streamlines authentication. We’ll also show how mgmt-gateway enhances security and access control while delivering full HA for Prometheus, Grafana, Alertmanager, and the dashboard, resulting in a more resilient and user-friendly experience for Ceph administrators.
Network is one of the bottlenecks influencing the Ceph performance. Especially the Ceph cluster network requires a high throughput and a low latency to speedup the I/O operations. The performance can be increased employing a shared memory communication (SMC).
SMC is a separate and fast communication channel implemented in the server hardware. While SMC-R (Shared Memory Communications over Remote Direct Memory Access) use Ethernet, SMC-D (Shared Memory Communications - Direct Memory Access) does employ the hardware shared memory. SMC-D channel does offload Ethernet and increase the Ceph cluster network throughput resulting in higher Ceph I/O operations.
We explain in detail SMC-D stack (the user space, the kernel space, and the firmware level), implementation in Ceph, and integration tests. We detail the performance analysis, namely Ceph setup to generate a high load on the cluster network, the client load, and the whole stack performance analysis. Performance analysis protocol and the results will be presented in tables and graphs. Advantages and disadvantages of the SMC-D channel will be given too. Implementation of SMC-D in Ceph demonstrate a significant I/O throughput increase.
Umbrella ("U") is planned as the next major release for the Ceph Distributed Storage System open-source project. Ceph File System development in Umbrella is aimed at addressing various pain points around the file system disaster recovery process, performance metrics, MDS tuning, user data protection and backups. Many of these themes were also discussed in the Cephalocon 2024 and various user/dev meetings.
This talk details improvements in each of those areas with a specific focus on ease of use and automation. Many noteworthy features have been introduced thereby improving the user experience across the board. Umbrella release aims to provide Ceph File System users and administrators a better and smoother experience.
The CERN Tape Archive (CTA) is the open source solution developed at CERN to store more than 1 Exabyte of data from CERN’s experimental programmes. CTA interfaces with two disk systems widely used by the High-Energy Physics (HEP) community, EOS and dCache. However, until now there has been no integration with systems used outside of HEP.
Looking at current industry standards, the leading interface for file and object storage is S3, which includes cold storage extensions for data archival. The CTA team are investigating whether CTA can be fronted by an S3 API. During this talk, we’ll review a proof-of-concept implementation, and look at alternative solutions to explore along with their respective trade-offs.
Concurrent storage access via standard network protocols such as SMB and NFS has become a common feature of many proprietary storage products. Samba, the leading open‑source SMB implementation, has long supported a limited set of multiprotocol scenarios by leveraging kernel interfaces and by allowing aspects of multiprotocol access to be implemented in the filesystem. Over time, several storage vendors have exploited these capabilities while using their own proprietary filesystems.
In this talk we will present our plan for a fully open‑source multiprotocol stack built on CephFS, Samba, and NFS‑Ganesha. First, we will describe the testing infrastructure we are creating and the use‑cases we intend to support in the initial release. We will then outline our approach to exclusive file locking and to a unified access‑control model.
This talk introduces an advanced storage acceleration strategy for I/O-intensive container workloads. In environments like CI/CD pipelines or database applications, performance is often constrained by storage latency. Our plan addresses this by implementing a transparent data caching layer that uses high-speed local storage to hold frequently accessed data, significantly reducing retrieval times and load on the primary storage system.
With a core focus on disaster recovery and fast StatefulSet failover, the primary cloud storage volume is intentionally left pristine and unmodified, containing solely user data All cache intelligence is kept local to the node. This design is critical for operational robustness, as it ensures the data can be restored to a consistent point in time, a fundamental requirement for reliable disaster recovery This allows the volume to be safely attached to any node for rapid failover maximizing both performance and data safety.
project: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver
For high-performance proxy services, moving data is the primary bottleneck. Whether it is an NFS-Ganesha server or a FUSE-based Ceph client, the application burns CPU cycles copying payloads between kernel and user space just to route traffic. While splice() exists, it imposes a rigid pipe-based architecture that is difficult to integrate into modern asynchronous event loops.
We propose a pure software zero-copy design that works with standard network stacks. In this model, a specialized kernel socket aggregates incoming network packets into a scatter-gather list. Instead of copying this data to the application, the kernel notifies userspace—potentially via io_uring—that a new data segment is ready and provides an opaque handle.
The application sees the headers to make logic decisions but acts only as a traffic controller for the payload. It uses the handle to forward the data to an egress socket or a driver like FUSE without ever touching the actual bytes. This talk will outline the design of this buffer-handling mechanism and demonstrate how it allows complex proxies like Ganesha and storage clients like Ceph to achieve true zero-copy throughput on standard hardware.
Modern S3 workloads generate massive duplicate data—from backup chains to model checkpoints—quietly consuming petabytes. Ceph’s new S3 data deduplication feature solves this by identifying identical content through chunking and cryptographic hashing, storing it only once, and tracking references with a lightweight dedup index.
This talk explains how dedup works inside Ceph RGW: how chunks are created, how refcounts stay consistent under parallel writes, versioning, and deletes, and how the system avoids corruption using atomic metadata updates and safe garbage collection. We’ll also share early performance insights from large-scale tests and show how dedup can significantly reduce capacity, I/O, and network overhead—without requiring any changes to S3 applications.
If you're interested in building efficient, scalable, open-source object storage, this session shows how Ceph makes S3 smarter with zero duplicates.