Serving AI models on a single GPU for multi tenant workload sounds challenging till you partition a GPU correctly.
This talk is a deep technical exploration of running AI inference workloads on modern GPUs across using Multi-Instance GPU (MIG) isolation.
We'll explore:
Whether you're building a multi-tenant inference platform, optimizing GPU utilization for your team, or exploring how to serve AI models cost-effectively, this talk provides practical configurations for your AI workloads.