OneAI is an open-source framework that provides the foundation required to manage AI model artifacts and inference workloads across OpenNebula-based cloud infrastructures. In this talk, we present how OneAI discovers, imports, and deploys models directly from Hugging Face Hub into OpenNebula clusters—turning complex AI lifecycle operations into a streamlined, infrastructure-native workflow. OneAI is built around three core components: (1) HFHUB Marketplace, a lightweight catalog that indexes Hugging Face models as metadata, deferring artifact materialization until deployment to dramatically reduce storage overhead; (2) SharedFS Datastore, a specialized OpenNebula datastore that treats directories as images, enabling efficient model storage on high-performance shared filesystems; (3) AI Service REST API, an orchestration layer that provisions model deployments, supervises the vLLM inference engine, and exposes OpenAI-compatible endpoints. We will provide an overview of the architecture behind these components and how they work together to create a clean, reproducible pipeline – from model discovery to fully deployed inference services. OneAI offers a fully open-source alternative to proprietary inference platforms. By building directly on OpenNebula’s capabilities, it reduces TCO by exploiting existing HPC storages, supports secure multi-tenancy, and enables scalable, production-ready inference deployments.