In the last 10 years there's been a hardware race to build the best application-specific integrated circuit (ASIC) for both machine learning training and inference i.e. AI accelerators. What started with vision processing units (VPUs) went through tensor PUs (TPUs) and now we are dealing with neural processing units (NPUs). What's next?
This talk will take a systematic look at the different hardware platforms for AI acceleration, but with a focus on the software stacks that support them on Linux. We'll take a look how individual vendors approached their ASICs from the kernel side, and how they exposed the acceleration functionality for user-space.
Is it all proprietary or is there liberté? We'll find out together!