The rapid evolution of deep learning has started an AI arms race. Last year, venture capitalists poured more than $1.5 billion into semiconductor start-ups and there are now some 45 companies designing chips[1] purpose-built for artificial intelligence tasks including Google with its Tensor Processing Unit (TPU). After quietly testing its "early access" system for nearly a year, one of these startups, Wave Computing, is close to announcing its first commercial product. And it is promising that a novel approach will deliver some big gains in terms of both performance and ease of use for training neural networks.
"A bunch of companies will have TPU knock-offs, but that's not what we do--this was a multi-year, multi millions of dollars effort to develop a completely new architecture," CEO Derek Meyer said in an interview. "Some of the results are just truly amazing."
With the exception of Google's TPUs, the vast majority of training is currently done on standard Xeon servers using Nvidia GPUs for acceleration. Wave's dataflow architecture is different. The Dataflow Processing Unit (DPU) does not need a host CPU and consists of thousands of tiny, self-timed processing elements designed for the 8-bit integer operations commonly used in neural networks.
Last week, the company announced that it will be using 64-bit MIPS cores in future designs, but this really for housekeeping chores. The first-generation Wave board already uses an Andes N9 32-bit microcontroller for these tasks, so MIPS64 will be an upgrade that will give the system agent the same 64-bit address space as the DPU as well as support for multi-threading so tasks can run on their own logical processors. (Meyer and others on the management team previously worked at MIPS, and Wave is backed in part by Tallwood, the same venture capital firm that