Learned sparse retrieval models such as SPLADE, uniCOIL, and other transformer-based sparse encoders have become popular for delivering neural-level relevance while preserving the efficiency of inverted indexes. But these models also produce indexes with statistical properties radically different from classic BM25: longer queries, compressed vocabularies, and posting lists with unusual score distributions. As a result, traditional dynamic pruning algorithms like WAND and Block-Max WAND often fail to exploit their full potential.
This talk presents Block-Max Pruning (BMP) from a systems and Rust-engineering perspective. We will walk through how BMP restructures query processing by partitioning document space into small, contiguous blocks and maintaining lightweight, on-the-fly score upper bounds that guide safe or approximate early termination.
The talk is aimed at developers building retrieval engines, Rust-based data systems, or ML-powered search pipelines who want to push sparse retrieval performance further. Attendees will leave with a clear understanding of how BMP works, why learned sparse models require new pruning strategies, and how to integrate these ideas into modern, high-performance Rust codebases.
Code and resources: BMP GitHub repository: https://github.com/pisa-engine/BMP/ Paper (SIGIR 2024): https://www.antoniomallia.it/uploads/SIGIR24.pdf
What are multi-vector embeddings? How do they differ from regular embeddings? And how can you build an AI-powered OCR system in under 5 minutes without paying a fortune for infrastructure? If you're curious for answers, join me! I'll break down ColBERT embeddings, explore how MUVERA compression is revolutionizing the way we work with multi-vectors, and show you how to leverage it all to build an AI-powered OCR system on resource constrained devices such as Raspberry Pi.
Weaviate DB: https://github.com/weaviate/weaviate Multi-Vector vision embeddings demo: https://github.com/antas-marcin/weaviate-multi-vector-example
Traditional QA pipelines—even those using baseline RAG—struggle with complex reasoning tasks such as multi-hop inference, contradiction detection, entity linking, temporal consistency, and large-scale cross-document understanding. These limitations become critical in domains like investigative journalism, scientific research, and legal analysis, where answers depend on relationships spread across many documents rather than isolated text chunks.
This talk will demonstrate how open-source knowledge-graph–based approaches can overcome these challenges by enabling structured retrieval, multi-hop reasoning, richer context assembly, and corpus-level summarization. We will explore several open-source frameworks used today to build graph-enhanced RAG systems and compare them across practical criteria: extraction quality, response latency, hardware requirements, maintenance complexity, and suitability for different problem types.
Attendees will leave with a clear, practical understanding of how to select and apply graph-based RAG techniques to extract deeper insight from large unstructured datasets.
Frameworks we're going to consider: - MS GraphRAG (MIT license) - https://github.com/microsoft/graphrag - LlamaIndex KG (MIT license) - https://github.com/run-llama/llama_index - KAG/OpenSPG (Apache-2.0 license) - https://github.com/OpenSPG/KAG
Search in Elasticsearch keeps evolving, from traditional BM25 keyword retrieval to multi-stage search that combine lexical, vector, and language-model-driven intelligence. In this talk, we’ll explore how Elasticsearch APIs enable developers to build hybrid search systems that mix classical scoring, dense vector search and semantic reranking in a single coherent workflow.
We’ll use ES|QL, Elasticsearch’s new query language, and show how constructs like FORK, FUSE, RERANK, COMPLETION, and full-text functions let you build multi-stage pipelines in a simple query.
We’ll discuss where ML models and LLMs fit into the retrieval stack, from embedding generation to on-the-fly augmentation and semantic rerankers.
Finally, we’ll look at future directions for search.
If you want a practical and forward-looking view of how search is evolving in Elasticsearch—and how to put multi-stage retrieval to work—this session is for you.
Meilisearch (https://www.meilisearch.com/) is a popular Open-Source search engine written in Rust that boasts more than 50k stars on GitHub, focusing on performance and ease-of-use. Meilisearch is designed to be available worldwide, which requires supporting multiple languages through word tokenization. But, how difficult is it to segment and normalize words? And, how different this process can be depending on the Language?
Meilisearch core maintainers share how they handled language support, the difficulties they faced, and the solution they found.
OpenSearch v3 major release that was introduced in the past year represents a significant leap forward in open source search technology, delivering breakthrough innovations across neural search, AI-driven search experiences and performance optimization. This talk explores the major features that define the 3.x releases and their impact on modern search applications.
We'll dive into differentiating capabilities like scalable Neural Sparse ANN Search using the SEISMIC algorithm, and the new Search Relevance Workbench for metric-driven relevance evaluation. Discover how system-generated Search Pipelines eliminate configuration overhead, automatically building Vector Search pipelines at query runtime and UI editor for AI Search workflow set up.
The release brings industry-standard search features including MMR, ColBERT’s late interaction, RRF, radial search, and one of the most popular pre-trained spare encoder model in HuggingFace positioning OpenSearch alongside leading search platforms. Performance innovations deliver dramatic improvements: Memory-Optimized and Disk-based Vector Search with efficient FAISS execution, star-tree indexes for multi-field aggregations, 2x storage savings through derived source, and reader/writer separation for independent index/search scaling and resiliency. Real-time data processing enables continuous query execution for streaming results, and ability to build vector indices remotely using GPUs, while QueryInsights helps monitor cluster’s search query performance.
Finally, we'll showcase Agentic Search capabilities—from Natural Language Agent Search to native AI agents with persistent memory, workflow UI editors for non-technical users to set up AI Search flows, and OpenSearch MCP integration with Claude code, Gemini CLI and other AI assistants to interact with OpenSearch.
This is your opportunity to hear from the OpenSearch maintainers and ambassadors about the latest and greatest in the project. Attendees will leave understanding how OpenSearch v3 addresses the full spectrum of modern search challenges: Neural and Vector Search, Search quality measurement, performance at scale, and the future of AI-powered Search experiences.