Virtually Attend FOSDEM 2026

Open Research Track

2026-02-01T09:00:00+01:00

The OpenFlexure Microscope is an open source, laboratory-grade robotic microscope, used by a diverse community including academic researchers, engineers, educators, pathologists and hobbyists (https://openflexure.org/, https://openflexure.discourse.group/). Users from over 60 countries have developed and used the device for everything ranging from exploring their garden's wildlife, to training medical students to diagnose cancer. Joe presents his experience as an academic member of the OpenFlexure development team for the last eight years. While his work focuses on the medical applications of the Microscope, research is planned and prioritised to benefit all members of the community. Development of the OpenFlexure software has enabled smart microscopy on the OpenFlexure Microscope, with automated sample identification, smart path planning and image processing, bringing novel research techniques such as digital pathology into new environments which traditionally lack the infrastructure to support them (https://gitlab.com/openflexure/openflexure-microscope-server, https://gitlab.com/openflexure/openflexure-microscope). The research builds on FOSS software and libraries, including Arduino and OpenCV, and extends open science by improving access to essential hardware. This is reflected in the range of OpenFlexure publications from outside the core development team, including peer reviewed articles in the fields of engineering, machine learning, medicine and social science.

2026-02-01T09:30:00+01:00

At the current rate of digitization, it is estimated that it would take hundreds of years to fully digitize the natural science collections of Europe. In the face of the biodiversity crisis, we urgently need to scale up digitization to equip researchers with the tools to tackle this challenge.

The Distributed System of Scientific Collections, DiSSCo, is a fully open source European infrastructure that is bringing together over 300 institutions into a unified, digital natural science collection. DiSSCo harmonizes data into one data model and enables sharing human expertise and machine services across institutions.

Through annotating specimen records on the platform, experts from around the world can contribute to curation and enhancement of data. Most crucially, taxonomists, whose expertise is highly specialized and sought after, can easily share their knowledge and improve specimen data across institutions.

Leveraging a shared data model, machine agents can further improve and enhance specimen data, through linking to other infrastructures, georeferencing, and even label transcription. Instead of being confined to a single institution, services adapted for DiSSCo can be applied to any specimen in Europe, breaking institutional silos and furthering collaboration.

These efforts culminate in a digital extended specimen, which acts as a “digital twin” to the physical object, with links to publications, genetic sequences, and other related information.

This presentation gives an overview of the progress of the DiSSCo infrastructure, collaboration with researchers and collection managers, and the future of DiSSCo’s development.

https://disscover.dissco.eu/ https://github.com/diSSCo

2026-02-01T10:00:00+01:00

The exponential growth of scientific literature—doubling roughly every nine years—has made it increasingly difficult for researchers and decision-makers to locate, assess, and synthesize the evidence needed for sound policy and practice. Systematic maps and systematic reviews offer robust, unbiased ways to answer “what works?” but today they depend on manual search and screening workflows that are slow, costly, and vulnerable to human error. The result is a bottleneck: high-quality, up-to-date evidence syntheses are often too labor-intensive to produce at the pace conservation challenges demand.

This talk and demo presents an open, community-driven approach to lowering that bottleneck using human-in-the-loop machine learning and transparent evidence-management tooling. In 2018, DataKind and the the Science for Nature and People Partnership, built two free and open-access, web-based workflows for computer-assisted paper screening and evidence management, integrated into a single collaborative application (colandrapp.com). The platform combines active-learning prioritization, reproducible labeling, and interactive visualization to help teams rapidly identify relevant studies from tens of thousands of documents, extract key metadata, and generate portable, shareable review outputs. All components are designed to support open research practices: auditable decision trails, exportable datasets, and interoperability with downstream synthesis and visualization tools. Now, in 2026, we are releasing a significant update to Colandr that ensures the tool continues to be functional and sustainable. Colandr is supported by a global community of researchers and volunteers (colandrcommunity.com) and this session will highlight the additional open source solutions that have been built on top of the Colandr stack in addition to the Colandr product updates.

We aim to engage the FOSDEM community around a concrete open research challenge: building trustworthy, extensible tools that keep evidence synthesis fast, reproducible, and accessible. Participants will leave with a clear view of the platform’s capabilities, the design decisions behind it, and a set of well-scoped technical and research directions where open-source contributors can meaningfully push the state of practice forward.

2026-02-01T10:30:00+01:00

AI has become an integral part of modern research, offering tremendous opportunities, but also raising important questions for the Open Science community.

With the emergence of the Open Source AI Definition (OSAID) and its emphasis on the four freedoms, the “freedom to study” stands out as a cornerstone for achieving true reproducibility. You can read the OSAID definition here: https://opensource.org/ai/open-source-ai-definition.

This talk will explore how researchers can design, implement, and sustain reproducible AI practices within their work, especially in Low and Middle Income Countries (LMICs), where infrastructure and culture around reproducibility are still developing. Drawing from practical examples and community experiences, I’ll outline actionable steps for embedding openness and reproducibility in AI workflows. These approaches are adaptable across contexts and can help build a more transparent, collaborative, and trustworthy global AI ecosystem.

My perspective is shaped by my work as an Open Source Manager and Project Coordinator with Data Science Without Borders, and as a contributor to The Turing Way, where I advocate for open, inclusive, and reproducible research practices in data science and AI.

2026-02-01T11:00:00+01:00

vLLM (https://github.com/vllm-project/vllm) has rapidly become a community-standard open-source engine for LLM inference, backed by a large and growing contributor base and widely adopted for production serving. This talk offers a practical blueprint for scaling inference in vLLM using two complementary techniques, quantization (https://github.com/vllm-project/llm-compressor) and speculative decoding (https://github.com/vllm-project/speculators). Drawing on extensive evaluations across language and vision-language models, we examine the real accuracy–performance trade-offs of each method and, crucially, how they interact in end-to-end deployments. We highlight configurations that substantially cut memory footprint while preserving model quality, and show when these speedups translate best to low-latency versus high-throughput serving. Attendees will leave with data-backed guidance, deployment-ready settings, and a clear roadmap for leveraging quantization and speculative decoding to accelerate vLLM inference in real-world pipelines.

2026-02-01T11:30:00+01:00

Quantum computing creates new opportunities, but building and operating a quantum cloud service remains a complex challenge, often relying on proprietary, black-box solutions. To bridge this gap, we introduce OQTOPUS (Open Quantum Toolchain for OPerators and USers) [1], a comprehensive open-source software stack designed to build and manage full-scale quantum computing systems. OQTOPUS provides a complete cloud architecture for quantum computers, covering three critical layers: 1.Frontend Layer: Web-based interfaces and SDKs that allow users to easily design and submit quantum circuits. 2.Cloud Layer: A scalable management system for users, jobs, and devices, designed to be deployable on public clouds (see oqtopus-cloud [2]). 3.Backend Layer: The core execution engine that handles circuit transcoding, error mitigation, and low-level device control, utilizing modular tools such as OQTOPUS Engine [3] and Tranqu [4].

Developed in collaboration with The University of Osaka, Fujitsu Limited, Systems Engineering Consultants Co., LTD. (SEC), and TIS Inc. (TIS) is already powering operational superconducting quantum computers. This talk will detail the modular architecture of OQTOPUS, demonstrating how developers and researchers can use it to construct their own quantum cloud platforms, customize compilation strategies, and experiment with hybrid quantum-classical workflows. Join us to learn how OQTOPUS is democratizing access to the deepest layers of quantum infrastructure.

Project Links: [1] OQTOPUS Organization: https://github.com/oqtopus-team [2] Cloud Layer: https://github.com/oqtopus-team/oqtopus-cloud [3] OQTOPUS Engine: https://github.com/oqtopus-team/oqtopus-engine [4] Tranqu: https://github.com/oqtopus-team/tranqu

2026-02-01T11:45:00+01:00

NoiseModelling is an open-source platform for simulating environmental noise propagation and generating regulatory-compliant noise maps at urban and regional scales. Leaded since 2008 by the Joint Research Unit in Environmental Acoustics at Gustave Eiffel University, it provides researchers and practitioners with reproducible, transparent, and scalable modelling capabilities for environmental acoustics. As the modelling core of the Noise-Planet framework, NoiseModelling simulates noise propagation from road traffic, railways, and industrial sources using the standardized CNOSSOS-EU method for emission and propagation. It operates as a Java library or through a user-friendly web interface, tightly integrated with spatial databases H2GIS or PostGIS to handle large-scale urban datasets efficiently. The broader Noise-Planet ecosystem complements NoiseModelling's simulation capabilities with participatory noise measurement through the NoiseCapture mobile application. After more than three years of operation, the platform has collected data from over 100,000 downloads and 74,000 contributors worldwide, enabling citizens and researchers to create high-resolution, crowdsourced noise maps that respect privacy while contributing to scientific research. This integrated approach bridges computational modeling with real-world measurements, promoting open science principles through open-source code, open data, and collaborative research.

https://noise-planet.org/

https://noisemodelling.readthedocs.io/en/latest/

2026-02-01T12:15:00+01:00

Abstract

Circular Economy research is stalled by a simple problem: we don't have open data on what products are actually made of. Manufacturers keep Bill of Materials (BOM) proprietary, and existing databases are expensive silos.

This talk introduces RELab, an open-source (AGPLv3) infrastructure designed to reverse-engineer this data through community crowdsourcing. We will dive into the technical architecture—a FastAPI/SQLModel backend and Expo/React Native mobile app—and discuss the challenges of implementing FAIR data principles in a "wild" contribution environment.

Description

To model material flows effectively, researchers need granular data on product weights, materials, and disassembly steps. Currently, this data is either locked in PDFs or proprietary databases.

RELab (Reverse Engineering Lab) is an attempt to build a protoype of a "Wikipedia for Products." It allows researchers and citizens to disassemble products, digitize them, and contribute to an open data commons.

In this lightning talk, I will cover:

  • The Architecture: How we built a scalable API using FastAPI, Pydantic, and PostgreSQL to handle complex, nested product data structures.

  • The Client: A cross-platform mobile app (built with Expo) that enables "in-the-field" data collection at recycling centers and repair cafes.

  • The Hardware: A quick look at our Raspberry Pi integration for standardized, reproducible product photography.

  • The Challenge: How to build a system that is technically interoperable with rigid industrial ecology tools (like openLCA or Brightway) while remaining accessible and engaging for non-technical citizen scientists.

We are looking for feedback on our strategy to gamify data collection and engage a broader community of contributors outside of academia.

Other details

  • Submission License: AGPLv3 (GNU Affero General Public License v3.0)
  • Speaker
  • Name: Simon van Lierde
  • Contact: s.n.van.lierde@cml.leidenuniv.nl
  • Biography: Simon van Lierde is a PhD candidate and research software engineer at Leiden University (CML). He maintains RELab, an AGPLv3 platform for crowdsourcing product composition data. His work focuses on building open infrastructure for the Circular Economy using modern stacks (FastAPI, React Native) and bridging the gap between industrial ecology and the open-source community.

Links to the project:

2026-02-01T13:00:00+01:00

OpenParlData.ch provides free access to harmonized data from Swiss parliaments. We currently offer data on political actors, parliamentary proceedings, decrees, consultations, votes, and more from 74 parliaments. Researchers (e.g. political scientists, linguists) but also journalists and civil society organizations can use our API to create their own analyses, visualisations, and tools, thereby promoting transparency, participation, and innovation in Swiss politics.

We import data (mostly from websites and some APIs), clean, harmonize and publish them openly. The data infrastructure is open source and currently in beta. In addition to the API, we are developing standards that enable parliaments and governments to publish uniform open data. Over the next year, we will address the question of how we can efficiently and financially sustainably operate a data infrastructure that continues to provide crucial data openly in three years' time and how we can enable other actors to publish interoperable high-quality data. We look forward to sharing what we have learnt and hearing your feedback!

2026-02-01T13:15:00+01:00

Xan is a command-line tool designed to manipulate CSV files directly from the comfort of the terminal.

Originally developed within a sociology research lab to perform common operations on very large datasets collected from the web (exploration, sorting, computing frequency tables, joins, aggregations, etc.), it has become a go-to solution for its users for many more use-cases, including lexicometry analysis, plotting histograms, time series or heatmaps, and even generating network graphs. And while the tool was initially created to deal with very large CSV files, it is now also used by people to process small files, and other file formats. The tool was thus included in the daily data manipulation practices of its users, who saw it as an opportunity to never leave their shells, without having to rely on GUIs or notebooks.

This presentation, given by a research engineer after two years of regular use, examines the reasons for this appropriation, which relates both to the constraints of research in the Humanities and Social Sciences and to the interface design choices that make xan effective.

2026-02-01T13:45:00+01:00

Open research requires skills that Free/Libre and Open Source Software (FLOSS) developers have been cultivating for decades and that have made them successful in building their communities and business models. Discussing in public, creating inclusive communities, developing governance models suitable for community-driven projects, securing funding are all skills FLOSS developers require to sustain their software projects.

These skills are equally needed in open research, when it is not merely understood as open access, but it is conceived as an effort to create communities of practice that overcome geographical, disciplinary, and social boundaries. Developing these skills in open research, however, is still work in progress. For instance, peer review, which is a mainstay of research, usually continues to take place behind closed doors and involves a limited number of actors rather than the entire community with risks of fraud and knowledge gatekeeping. Similarly, open research networks struggle to be as inclusive as FLOSS projects. Participation is traditionally determined by institutional affiliations and funding, and citizen science contributions are often neglected because they do not fit into the scheme of traditional scholarly knowledge. Robust governance models and long-term funding strategies are also lacking in open research in many cases.

The talk is a personal reflection based on my experience working in research data management and engaging in my free time with open-source projects and volunteer-based initiatives to promote coding literacy. At times when AI-generated code is marketed as the only possible future of software, including research software, my reflection focuses instead on the human skills and shared values underpinning software development in FLOSS communities, why I consider them a precious asset, and why I hope they will continue to exist and fully transfer into open research.

2026-02-01T14:15:00+01:00

In 2012, a small group looking at challenges related to the development and maintenance of research software realized that there was no community identity (e.g., common title, career path, professional association) for the people involved, so they started a process to define and create these. Today, 13+ years later, there are research software engineer (RSE) and engineering (RSEng) groups at more than 100 universities, and RSE societies and associations in more than 10 countries (e.g., UK, US, Germany, Belgium), with over 10000 members and annual physical and virtual conferences, including a first global research software conference coming in 2026. This talk will briefly discuss the movement that created this, then will focus on the experience of the University of Illinois Urbana-Champaign, where there is now a group of 45 RSEs in the National Center for Supercomputing Applications (NCSA), and many more across the university. RSEs at NCSA bring skills and expertise including full-stack development, UI/UX design, GIS, AI, MLOps, DevOps, and data science and engineering with projects such as Clowder, IN-CORE, Illinois Chat, DeCODER, etc., across multiple scholarly and industrial domains. Beyond technical advancement, the group has been developing and enhancing mentoring RSEs and RSE managers. The talk will discuss how this group was developed, the challenges it overcame, and the challenges that remain.

2026-02-01T14:45:00+01:00

So, you want to create an open-source research software package — and not just for yourself or your group. You’d like people around the world to use it, and even contribute to it. How do you persuade them it’s worth their time?

Open-source projects rise and fall on trust. You may hope to build trust on technical merits: your algorithm is novel; your implementation fast; your tests thorough. All great, but not enough. Many technically excellent projects never break through because they neglect the social foundations of trust, which are laid long before a project matures.

And that's good news: you don’t need to be a top-tier programmer to build a successful open-source tool. Normal researchers do this all the time. What matters most is how you run the project, not how fancy the code is.

This talk distils lessons from years of building and maintaining scientific Python tools used by researchers worldwide. I’ll outline the practices that signal reliability and sustainability across a project’s lifecycle: defining and communicating your mission from the start; making a reasonable first release and following it up with consistency; and using open communication channels to embody your values and model healthy norms.

Throughout the talk, I’ll draw on examples from movement — a Python package I develop — and other tools built by the Research Software Engineering team I’m part of. That said, the lessons should be applicable to any free open-source project that aspires to attract and sustain a healthy community.

Takeaway: If you behave like a trustworthy project from the beginning, people will treat you like one, and help the project grow into what it promises to be.

2026-02-01T15:15:00+01:00

Jupyter Book is a core tool for sharing computational science, powering more than 14,000 open, online textbooks, knowledge bases, lectures, courses and community sites. It allows researchers and educators to create books and knowledge bases that are reusable, reproducible, and interactive.

Over the past two years, we have rebuilt Jupyter Book from the ground up, focused on allowing authors to produce machine readable, semantically structured content that can be flexibly deployed, reused, and cross referenced in unprecedented ways. We achieved this by adopting, stewarding, and developing the MyST Markdown Document Engine (mystmd.org), a more flexible and extensible engine that integrates with Jupyter for interactive computation. Jupyter Book 2 represents a major leap forward in how we share and distribute computational content on the web.

In this talk, we cover the key ideas driving Jupyter Book 2 and MyST, and showcase real-world examples like The Turing Way, and Project Pythia. We'll demonstrate major new functionality with live demos, and give the audience practical tips for getting started with the new Jupyter Book 2 stack.

2026-02-01T15:45:00+01:00

A presentation of a new tool that allows visualising groups of Wikipedia articles, analysing and monitoring them, supporting the work of volunteers, researchers, and institutions, and creating knowledge landscapes.

The prototype focuses on Wikipedia articles related to climate change and sustainability, aiming to assess current coverage of these topics and test interventions. However, the tool developed can be applied to any topic, starting from Wikidata and Wikipedia categories.

This free and open software tool is developed in the framework of the international research project “Visual Analytics for Sustainability and Climate Change: Assessing online open content and supporting community engagement. The case of Wikipedia" (2025-2029), led by the University of Applied Sciences and Arts of Southern Switzerland (SUPSI), in collaboration with Wiki Education Foundation, Wikimedistas de Uruguay, Wiki in Africa and Open Climate Campaign, with the endorsement of Wikimedia Italia, the support of the SNSF (10.003.183) and the engagement of many Wikipedia and Wikidata volunteers.

The presentation is an invitation to contribute to the design of the tool and its tests.

  • Research project: https://meta.wikimedia.org/wiki/Visualizing_sustainability_and_climate_change_on_Wikipedia
  • Co-design of the tool: https://meta.wikimedia.org/wiki/Visual_Analytics_for_Sustainability_and_Climate_Change/Tool/Co-design_activities
  • Prototype https://giovannipro.github.io/wikipedia-climate-change/?lang=en
  • The visualisations will be integrated into the dashboard Visualizing Impact by Wiki Education.
2026-02-01T16:00:00+01:00

How to work with toxic data? In our project we work with DNS query streams, which contain a lot of data that may expose single users and their browsing behaviour.

This talk covers how we have built a large scale statistics platform while preserving the user’s privacy and still being able to find important observations. We cover which algorithms and methods we use to gather the data in a cloud platform and run advanced analytics without touching individual user data. We share how to go from big data sets to small aggregated and minimised sets.

We believe the approach of "small data" is applicable to any field where you want to use and share sensitive data. We also invite the audience to audit our work and help build a privacy-first internet statistics platform as one good example.

2026-02-01T16:30:00+01:00

The “Gambit” project for computation in game theory has been through multiple phases of development, dating back to the 1980s. Game theory as a field & methodology emerged from economics, but increasingly has applications in cybersecurity, multi-agent systems research and AI. Gambit is used across these fields for both teaching purposes, and as a suite of software tools for scientific computing. Recent Gambit development has been carried out at The Alan Turing Institute and has involved a modernisation of the PyGambit Python API, with a particular focus on improving the user experience, including clear user tutorials and documentation. This in turn has helped to guide the prioritisation of features in recent package releases.

This talk will introduce some fundamental concepts in game theory using PyGambit, explaining how the package can be used to create and visualise non-cooperative games, and compute their Nash equilibria (where game players have no incentive to deviate their strategies). The talk will also highlight how PyGambit fits into the broader open-source scientific computing ecosystem for research on games via interoperability with the OpenSpiel framework, which is used for reinforcement learning.