Go's pprof tells you where your CPU time is spent, but not why the CPU is slow. Is it cache misses? Branch mispredictions? These hardware-level performance characteristics are invisible to pprof but critical for optimisation.
perf-go bridges this gap by leveraging Linux's perf tool and CPU Performance Monitoring Units (PMUs) to expose hardware performance counters for Go programs. It translates perf's low-level observations into pprof's familiar format, giving Go developers hardware insights without leaving their existing workflow.
In this talk, we'll: - Demonstrate the limitations of pprof for understanding performance bottlenecks - Show how perf-go exposes CPU cache behaviour, branch prediction, and memory access patterns - Walk through real benchmarks where we identify and fix cache-line contention issues - Explore how hardware counters can guide improvements that pprof alone wouldn't reveal
Go developers who want to optimise performance-critical code and understand the "why" behind their bottlenecks. Basic familiarity with profiling concepts helpful but not required.