Virtually Attend FOSDEM 2026

One GPU, Many Models: What Works and What Segfaults

2026-01-31T13:55:00+01:00 for 00:20

Serving multiple models on a single GPU sounds great until something segfaults.

Two approaches dominate for parallel inference: MIG (hardware partitioning) and MPS (software sharing). Both promise efficient GPU sharing.

I tested both strategies for video generation workloads in parallel.

This talk digs into what actually happened: where things worked, where memory isolation fell apart, which configs crashed, and what survives under load.

By the end, you'll know:

  1. How to utilize unused GPU capacity.
  2. How to setup MIG and MPS.
  3. Memory issues, crashes, and failures.
  4. Workload specific configs

View on FOSDEM site