Bioinformatics & Computational Biology Track

Workflows made easy: the nf-core community

2026-01-31T15:05:00+01:00

Nextflow is a workflow manager that enables scalable and reproducible workflows. Nextflow is complemented by the nf-core community effort that aims at developing and supporting a curated collection of Nextflow pipelines, developed according to a well-defined standard, and their components. Since its inception, nf-core has set rigorous standards for documentation, testing, versioning and packaging of workflows, ensuring that pipelines can be "run anywhere" with confidence.

In order to help adhere to the standards, nf-core comes along with nf-core/tools, an open-source toolkit designed to support the Nextflow pipeline ecosystem. These include tools for the creation, testing, and sharing of Nextflow workflows and components. The nf-core tooling is central to all nf-core pipelines, but it can also be used to develop pipelines outside the nf-core community.

The pipelines and the tooling are actively maintained by the nf-core contributors and by the nf-core infrastructure team (supported by the CRG, SciLifeLabs, QBIC, and Seqera). This infrastructure provides everything: from pipeline templates to management of nf-core components, ensuring consistency and high quality across projects.

In this talk, we’ll give a short introduction to nf-core and how nf-core/tools supports both pipeline developers and end users, helping the community build reliable and reusable workflows.

Building Open Research Infrastructure: Connecting the Lab Bench to Computational Analysis with RSpace & Galaxy

2026-01-31T15:30:00+01:00

Modern research workflows are often fragmented, requiring scientists to navigate a complex path from the lab bench to computational analysis. The journey typically involves documenting experiments in an electronic lab notebook and then manually transferring data to a separate computational platform for analysis. This process creates inefficiencies, introduces errors, and complicates provenance tracking. To address this challenge, we have developed a tight, two-way integration between two open-source solutions: RSpace, a research data management platform and ELN, and Galaxy, a web-based platform for accessible, reproducible computational analysis. By connecting two open-source platforms, we're building truly open research infrastructure that institutions can adapt to their specific needs while maintaining full control over their research data.

The integration's foundational step makes RSpace a native repository within Galaxy, enabling researchers to browse their RSpace Gallery and import data directly into Galaxy histories. This connection is bidirectional; not only can data be pulled into Galaxy but also selected outputs or even entire histories can be exported back to RSpace. This creates a seamless FAIR data flow that preserves the critical link between experimental results and their computational context.

Building on this foundation, the integration has been further extended to allow researchers to initiate analysis directly from RSpace. By selecting data attached to a document and clicking a Galaxy icon, users upload it into a fresh, systematically-annotated Galaxy history that traces the data to its experimental source. This allows to document field work, launch a complex analysis, monitor its progress, and import the results, all while maintaining a clear and auditable link between the initial data and documentation and the outputs of the final computational analysis.

This partnership between two open-source platforms represents a significant stride towards more open, integrated, cohesive research infrastructure that institutions can build upon, reducing friction so scientists can focus on discovery rather than data logistics. Future developments will focus on improving the native repository integration, automated reporting of results back to RSpace, enhanced RO-Crate support for standardized metadata exchange, and improved templating in RSpace for sharing and reusing specific workflow configurations.

https://galaxyproject.org/
https://www.researchspace.com/
https://galaxyproject.org/news/2025-02-27-rspace-talk/
https://galaxyproject.org/news/2025-06-23-rspace-integration/
https://www.researchspace.com/blog/rspace-galaxy-filesource-integration
https://www.researchspace.com/blog/rspace-adds-galaxy-integration
https://documentation.researchspace.com/article/zzsl46jo5y-galaxy

Building Everything with Nothing – Harnessing Nix for Bioinformatics

2026-01-31T15:45:00+01:00

I will share how adopting Nix transformed my bioinformatics practice, turning fragile, environment‑dependent pipelines into reliable, reproducible workflows. I will walk the audience through the practical challenges of traditional Docker‑centric setups, introduce the core concepts of Nix and its package collection (nixpkgs), and explain how tools such as rix and rixpress or bionix simplify data analysis workflows. Attendees will leave with concrete strategies for managing development environments, rapid prototyping, and generating Docker images directly from Nix expressions—complete with tips, tricks, and curated resources to lower the barrier to adoption. Whether you’re unfamiliar with Nix or have found it intimidating, this session aims to inspire a shift toward reproducible, maintainable bioinformatics pipelines.

nf-core proteinfold: a community-driven open source pipeline for deep learning based protein structure prediction methods

2026-01-31T16:05:00+01:00

The release of AlphaFold2 paved the way for a new generation of prediction tools for studying unknown proteomes. These tools enable highly accurate protein structure predictions by leveraging advances in deep learning. However, their implementation can pose technical challenges for users, who must navigate a complex landscape of dependencies and large reference databases. Providing the community with a standardized workflow framework to run these tools could ease adoption.

Thanks to its adherence to nf-core guidelines, the nf-core/proteinfold pipeline simplifies the application of state-of-the-art protein structure modeling techniques by taking advantage of the optimized execution Nextflow’s capabilities on both cloud providers and HPC infrastructures. The pipeline integrates several popular methods, namely AlphaFold 2 and 3, Boltz 1 and 2, ColabFold, ESMFold, HelixFold, RosettaFoldAA, and RosettaFold2NA. Following structure prediction, nf-core/proteinfold generates an interactive report that allows users to explore and compare predicted models together with standardized confidence metrics, harmonized across methods for consistent interpretation. The workflow also integrates Foldseek-based structural search, enabling the identification of known protein structures similar to the predicted models.

The pipeline is developed through an international collaboration that includes Australian BioCommons, the Centre for Genomic Regulation, Pompeu Fabra University, and the European Bioinformatics Institute, and it already serves as a central resource for structure prediction at several of these organisations and others. This broad adoption demonstrates how nf-core/proteinfold, through its open-source and community-driven development model, is lowering the barrier to using deep learning based approaches for protein structure prediction in everyday research.

Interestingly, nf core proteinfold represents a new generation of Nextflow workflows designed to place multiple alternative methods for the same task within one coherent framework. This design makes it possible to compare the different procedures, providing a basis for developing combined approaches that may mature into meta-methods.

More info

nf-core project

nf-core/proteinfold pipeline

nf-core/proteinfold GitHub repository

Join nf-core

My bluesky

ProtVista: Open-Source Protein Feature Visualisation with reusable Web Components

2026-01-31T16:20:00+01:00

ProtVista is an open-source protein feature visualisation tool used by UniProt, the high-quality, comprehensive, and freely accessible resource of protein sequence and functional information. It is built upon the suite of modular standard and reusable web components called Nightingale, a collaborative open-source library. It enables integration of protein sequence features, variants, and structural data in a unified viewer. These components are shared across resources, for example Nightingale components also power feature visualisations in InterPro or PDBe, and the turnkey ProtVista library is used by Open Targets or Pharos.

ProtVista is undergoing major technical upgrades, to expand its reach, cover broader use cases, and also be able to handle ever-growing quantities of data. We are transitioning from SVG graphics to Canvas/WebGL rendering to improve performance for large datasets and on low-spec devices. We are refactoring the tool’s core to allow custom data inputs via a configurable API, letting developers plug in their own protein annotation data sources. Additionally, a new track configuration UI will let end-users toggle and rearrange feature tracks for a more flexible, tailored view. This talk will introduce ProtVista’s open-source design based on standards and demonstrate how these upcoming enhancements make it easier and faster to build interactive protein feature visualisations.

Relevant links: - ProtVista codebase: https://github.com/ebi-webcomponents/protvista-uniprot - Nightingale codebase: https://github.com/ebi-webcomponents/nightingale - Publication “Nightingale: web components for protein feature visualization”, 2023 https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad064/7178007 - Publication “ProtVista: visualization of protein sequence annotations”, 2017 https://academic.oup.com/bioinformatics/article/33/13/2040/3063132

Helping to Mend the Disconnect Between Biological Research and Medicine: A tale of two -- different -- kinds of graphs

2026-01-31T16:35:00+01:00

As our tools evolve from scripts and pipelines to intelligent, context-aware systems, the interfaces we use to interact with data are being reimagined.

This talk will explore how accelerated and integrated compute is reshaping the landscape of biobank-scale datasets, weaving together genomics, imaging, and phenotypic data with and feeding validatable models. Expect a whirlwind tour through: · Ultra-fast sequence alignment and real-time discretization · Estimating cis/trans effects on variant penetrance via haploblock architecture · Biobank scale data federation · Knowledge graphs as dynamic memory systems (GNNs - LLM co-embedding)

We'll close by tackling the unglamorous but essential bits: validation, contextualization, and the digital hygiene required to keep model-generated data from becoming biomedical junk DNA. Think of it as a roadmap toward smarter, faster, and more trustworthy data-driven healthcare.

Gen: Git for genomes

2026-01-31T16:50:00+01:00

Advances in DNA sequencing and synthesis have made reading and writing genetic code faster and cheaper than ever. Yet most labs run experiments at the same scale they did a decade ago, not because the biology is limiting, but because the software hasn't caught up.

The conventional digital representation of a genome is a string of nucleotides. This works well enough for simple projects, but the model breaks down as complexity grows. Sequences aren't constant: they evolve, mutate, and are iterated on. Unlike software, there's no instant feedback loop to tell you if an edit worked; wetlab experiments take time. You gain some of that time back by working with multiple sequences in parallel. But keeping track of thousands of sequences and coordinate frames is tricky at best when a researcher is working solo, and far harder when collaborating with other people or agents on the same genetic codebase.

Gen is a version control system built specifically for biological sequences (http://github.com/genhub-bio/gen). It models genomic data as a graph rather than flat text, preserving the full structure of variation, editing history, and experimental lineage. On top of this, projects are organized into repositories with branching, diffing, and merging, just like git. Git was first released 20 years ago and transformed how software teams collaborate on shared codebases. Gen brings that same workflow to biology.

This talk will introduce Gen's design philosophy and walk through a real-world use case. Gen is open source under the Apache 2.0 license, implemented in Rust with a terminal interface and Python bindings, and designed to integrate with existing bioinformatics pipelines.

dingo: a Python package for metabolic flux sampling

2026-01-31T17:10:00+01:00

dingo is a Python package that brings advanced scientific-computing techniques into the hands of developers and researchers. It focuses on modelling metabolic networks — complex systems describing how cells process nutrients and energy — by simulating the full range of possible biochemical flux states. Historically, exploring these possibilities in large-scale networks has been computationally prohibitive. dingo introduces state-of-the-art Monte Carlo sampling algorithms that dramatically speed up these simulations, enabling the analysis of very large models such as Recon3D on a regular personal computer in under a day. With its easy-to-use Python interface and integration within the broader scientific Python ecosystem (e.g. NumPy, Matplotlib), dingo lowers the barrier to entry for studying complex biological systems. This talk will walk the audience through the computational challenges of metabolic modelling, show how dingo leverages Python and efficient sampling to overcome them, and highlight how Python developers and computational biologists alike can contribute to or extend this open-source project. Whether you’re interested in open-source scientific software, computational biology, or high-performance Monte Carlo methods in Python, this talk aims to inspire and provide actionable insight into using and contributing to dingo.

Avoid information leakage pitfalls while doing AI in bioinformatics

2026-01-31T17:25:00+01:00

AI is gaining importance in bioinformatics with new methods and tools popping every day. While applications of AI in bioinformatics inherited a lot of technological solutions from other AI-driven fields, such as image recognition or natural language processing, this particular domain has its own challenges. An alarming example is a study showing that most AI models for detecting COVID from radiographs do not rely on medically relevant pathological signals, but rather in shortcuts such as text tokens on the images (DeGrave et al., Nat Mach Intell, 2021, doi: 10.1038/s42256-021-00338-7), stressing the importance of the data, on which the AI models were trained. Equally special is the data used for training biological language models: first, it is not that large compared to natural languages (e.g. one of the most successful protein language models ESM-2 has been trained on only 250M sequences), and second, it is highly structured by evolution and natural selection, and thus has a relatively low intrinsic dimension.

In my talk, I will speak about consequences of this underlying structure of the data for performance of models that are trained with it -- spoiler alert! it is terribly overestimated. The reason for this is information or data leakage: the model remembers irrelevant features highly correlated with the target variable and does not learn any biologically meaningful properties that can be transferred to out-of-distribution data. I will present our own check list (see our paper Bernett et al., Nat Methods, 2024, doi: 10.1038/s41592-024-02362-y) and a solution (https://github.com/kalininalab/DataSAIL, Joeres et al., Nat Comm, 2025, doi: 10.1038/s41467-025-58606-8) for avoiding the information leakage pitfall. I will discuss examples and applications from protein function prediction and drug discovery.

Movement: a Python toolbox for analysing motion tracking data

2026-01-31T17:40:00+01:00

The study of animal behaviour has been transformed by the increasing use of machine learning-based tools, such as DeepLabCut and SLEAP, which can track the positions of animals and their body parts from video footage. However, there is currently no user-friendly, general-purpose solution for processing and analysing the motion tracks generated by these tools. To address this gap, we are developing movement, an open-source Python package that provides a unified interface for analysing motion tracking data from multiple formats. Initially, movement prioritised implementing methods for data cleaning and kinematic analysis. We are now focusing on expanding its data visualization capabilities and on developing metrics to analyze how animals interact with each other and with their environment. Future plans include adding modules for specialised applications such as pupillometry and collective behaviour, as well as supporting integration with neurophysiological data analysis tools. Importantly, movement is designed to cater to researchers with varying levels of coding expertise and computational resources, featuring an intuitive graphical user interface. Furthermore, the project is committed to transparency, with dedicated engineers collaborating with a global community of contributors to ensure its long-term sustainability. We invite feedback from the community to help shape movement's future as a comprehensive toolbox for analysing animal behaviour. For more information, please visit movement.neuroinformatics.dev.

EDEN: A modular platform for neural simulator research

2026-01-31T17:55:00+01:00

The electrochemical-level simulation of neurons brings together many different challenges in the realms of biophysical modelling, numerical analysis, HPC, neuromorphic hardware and software design. To approach these challenges, we recently developed a modular platform, EDEN (https://eden-simulator.org). EDEN offers both a pip installable simulation package for neuroscientists, and a modular construction kit for neuro-simulator programmers to rapidly develop and evaluate new computational methods. It leverages the community standard NeuroML (https://neuroml.org) to integrate with the existing open-source stack of modelling and analysis tools, and minimise the barrier to entry for technical innovations in neural simulation.

Further reading: - the 2022 paper for the high-level design - the 2025 paper for the plug-in architecture

Debian Med beyond COVID-19: how a Debian Blend gained momentum

2026-01-31T18:10:00+01:00

Back in 2020, the COVID-19 pandemic unexpectedly gave the Debian Med project a strong boost. New contributors joined, collaboration intensified, and Debian’s role in supporting biomedical research and infrastructure became more visible.

Almost five years later, Debian Med continues to benefit from this momentum. The project still shows higher activity levels than before the pandemic, with lasting improvements in package quality, continuous integration coverage, and cooperation with other Debian teams.

This talk will present how the Debian Med team has evolved since the pandemic, which effects have lasted, and where new challenges have emerged as both the world — and Debian — have settled into a new normal.

You can learn more about Debian Med at https://www.debian.org/devel/debian-med/

Datavzrd: Rapid programming- and maintenance-free interactive visualization and communication of tabular data

2026-01-31T18:30:00+01:00

Tabular data, often scattered across multiple tables, is the primary output of data analyses in virtually all scientific fields. Exchange and communication of tabular data is therefore a central challenge. With Datavzrd, we present a tool for creating portable, visually rich, interactive reports from tabular data in any kind of scientific discipline. Datavzrd unifies the strengths of currently common generic approaches for interactive visualization like R Shiny with the portability, ease of use and sustainability of plain spreadsheets. The generated reports do not require the maintenance of a web server nor the installation of specialized software for viewing and can simply be attached to emails, shared via cloud services, or serve as manuscript supplements. They can be specified without requiring imperative programming, thereby enabling rapid development and offering accessibility for non-computational scientists, unlocking the look and feel of dedicated manually crafted web applications without the maintenance and development burden. Datavzrd reports scale from small tables to thousands or millions of rows and offer the ability to link multiple related tables, allowing to jump between corresponding rows or hierarchically explore growing levels of detail. We will demonstrate Datavzrd on real-world bioinformatics examples from tools such as Orthanq and Varlociraptor, highlighting how it can turn complex analytical outputs into interactive, shareable reports.

Software: https://github.com/datavzrd/datavzrd General Website: https://datavzrd.github.io

Lightning Talks

2026-01-31T18:40:00+01:00

We wanted to showcase a lot of different contributions and the beautiful heterogeneity of bioinformatics ending with a lighting talk session! Here's the list of the 3' presentations:

Guixifying workflow management system: past, present, maybe future? by Simon Tournier
VTX, High Performance Visualization of Molecular Structure and Trajectories by valentin
Multimodal Tumor Evolution Analysis: Interactive 4D CT and Time-Aligned Clinical Data in a Hospital Web Platform by Fabian Fulga
DNA storage and open-source projects by Babar Khan
From Binary to Granular: Automating Multi-Threshold Survival Analysis with OptSurvCutR by Payton Yau

Guixifying workflow management system: past, present, maybe future? Bioinformatics and Computational Biology face a twofold challenge. On one hand, the number of steps required to process the amounts of data is becoming larger and larger. And each step implies software involving more and more dependencies. On the other hand, Reproducible Research requires the ability to deeply verify and scrutinize all the processes. And Open Science asks about the ability to reuse, modify or extend.

Workflow might be transparent and reproducible if and only if it’s built on the top of package managers that allow, with the passing of time, to finely control both the set of dependencies and the ability to scrutinize or adapt.

The first story is Guix Workflow Language (GWL): a promise that has not reached its potential. The second story is Concise Common Workflow Language (CCWL): compiling Guile/Scheme workflow descriptions to CWL inputs. The third story is Ravanan: a CWL implementation powered by Guix – a transparent and reproducible package manager.

This talk is a threefold short story that makes one: long-term, transparent and reproducible workflow needs first package managers.

VTX, High Performance Visualization of Molecular Structure and Trajectories VTX is a molecular visualization software capable to handle most molecular structures and dynamics trajectories file formats. It features a real-time high-performance molecular graphics engine, based on modern OpenGL, optimized for the visualization of massive molecular systems and molecular dynamics trajectories. VTX includes multiple interactive camera and user interaction features, notably free-fly navigation and a fully modular graphical user interface designed for increased usability. It allows the production of high-resolution images for presentations and posters with custom background. VTX design is focused on performance and usability for research, teaching, and educative purposes. Please visit our website at https://vtx.drugdesign.fr/ and/or our github at https://github.com/VTX-Molecular-Visualization for more.

Multimodal Tumor Evolution Analysis: Interactive 4D CT and Time-Aligned Clinical Data in a Hospital Web Platform Modern oncology practice relies on understanding how tumors evolve across multiple imaging studies and how these changes correlate with clinical events. This talk presents a hospital-oriented web platform for multimodal tumor evolution analysis, integrating interactive 4D CT visualization with time-aligned clinical data, including PDF clinical documents, lab results and treatment milestones.

The system combines a Node.js front end with a Flask-based visualization backend that handles CT preprocessing, metadata extraction, and generation of time-synchronized 4D volumes. Clinicians can navigate volumetric CT scans across multiple time points, compare tumor morphology longitudinally, and immediately access the corresponding clinical context within the same interface. The platform displays radiology reports, pathology documents, and other PDF-based data side-by-side with imaging, creating a unified temporal view of patient evolution.

We describe the architecture, including the ingestion pipeline for DICOM and document data, the design of the multimodal synchronization layer, rendering strategies for large 4D CT volumes, and the integration of document viewers and time-series dashboards.

Web platform: https://github.com/owtlaw6/Licenta Flask App (CT Scan related scripts): https://github.com/fabi200123/4D_CT_Scan

DNA storage and open-source projects The magnetic recording field goes back to the pioneering work of Oberlin Smith, who conceptualized a magnetic recording apparatus in 1878. Fast forward, in 1947, engineers invented the first high-speed, cathode ray tube based fully electronic memory. In 1950, engineers developed magnetic drum memory. In 1951, the first tape storage device was invented. By 1953, engineers had developed magnetic core memory. The first hard disk drive RAMAC was developed in 1957. Since then, HDDs have dominated the storage for several decades and continue to do so because of its low cost-per-gigabyte and low bit-error-rate. Based on some estimates, in 2023, approximately 330 million terabytes of data were created each day. By 2024, HDDs dominated over half of the world’s data storage. As of 2025, approximately 0.4 zettabytes of new data are being generated each day, which equals about 402.74 million terabytes. What does it indicate? Data is growing and there is a need of solutions in term of longevity, low power consumption, and high capacity. Deoxyribonucleic acid (DNA) based storage is being considered as one of the solutions. This talk is about current status of DNA storage and open-source projects that exist in this domain so far.

From Binary to Granular: Automating Multi-Threshold Survival Analysis with OptSurvCutR In risk modelling, categorising continuous variables—such as biomarker levels or credit scores—is essential for creating distinct risk groups. While existing tools can optimize a single threshold (creating "High" vs "Low" groups), they lack a systematic framework for identifying multiple cut-points. This limitation forces analysts to rely on simple binary splits, which often mask the actual shape of the data. This approach fails to detect complex biological realities, such as U-shaped risk profiles or multi-step risk stratification involving 3, 4, or even 5+ distinct groups.

In this lightning talk, I will introduce OptSurvCutR, an R package designed to bridge this gap using a reproducible workflow. Currently under peer review at rOpenSci, the package automates the search for optimal thresholds in time-to-event data.

I will demonstrate how the package: - Goes Beyond Binary Splits: Unlike standard tools restricted to a single cut-off, OptSurvCutR uses systematic searches to identify multiple thresholds, automatically defining granular risk strata (e.g., Low, Moderate, High, Severe).

Prevents False Positives: It integrates statistical corrections (MSRS) to ensure that the differences between these multiple curves are real, not just random chance.
Quantifies Uncertainty: It uses bootstrap validation to measure the stability of the thresholds, ensuring that your multi-level risk model is robust.

Project Links: - Source Code (GitHub): https://github.com/paytonyau/OptSurvCutR - rOpenSci Review Process: https://github.com/ropensci/software-review/issues/731 - Preprint: https://doi.org/10.1101/2025.10.08.681246