Virtually Attend FOSDEM 2026

Data Lakes for AI: Open Table Formats as the Foundation

2026-01-31T15:25:00+01:00 for 00:05

In the era of big data and artificial intelligence, organizations are increasingly relying on data lakehouses to store, process, and analyze vast amounts of structured and unstructured data. The path from raw data to a production-ready AI model is complex, often bottlenecked by data inconsistencies, schema drift, and a lack of data versioning. In the spirit of open source AI hacking, the focus must shift to ensuring the underlying data infrastructure is as reliable and reproducible as the models themselves. This presentation addresses the critical role of Open Table Formats (OTFs)—specifically Apache Iceberg, Delta Lake, and Apache Hudi—in transforming unstructured data lakes into reliable, queryable data lakehouses. OTFs provide database-like capabilities, including ACID transactions, schema evolution, and time travel, directly on low-cost object storage The talk will provide a practical overview of integrating open-source OTFs with popular AI/ML frameworks, empowering "AI Plumbers" to build robust, governed, and highly performant data foundations for their next generation of open-source models

View on FOSDEM site