2084: AI Hardware - how will the hardware for AI look like in the future?
A few notes on OpenAI's future plans, Cerebras, and smaller startups working on AI hardware.
2 years ago, Microsoft invested 1 billion dollars into OpenAI to create OpenAI’s first supercomputer - a behemoth of a system that offered 285 000 CPU cores and 10 000 GPUs. It is quite possibly this supercomputer which has trained GPT-3 and ChatGPT, as well as Dall-E 2 to the sheer extent that they are today.
But these models are massive, and if the rumors are anything to go by, the future models will be even bigger. Take GPT-3 for example. 175 billion parameters at a normal 16 bit float encoding is about 350 gigabytes - to even perform inference, the load needs to be either spread over a lot of GPUs or a few absolutely massive GPUs. It’s actually rather absurd that they even let the public use such a massive amount of compute. Now the rumor mill says that GPT-4 will have 100 trillion parameters. At that point you’re talking about a model that’s 200 TB. Even if its quite sparse, you’ll still need even bigger machines to handle the training and inference.
Enter Nvidia. Last month they announced their partnership with Microsoft to build a brand new and even bigger AI supercomputer. The new computer would use large amounts of Nvidia’s new AI specific GPU, H100 GPUs, and cement their reputation as the company for machine learning.
Now H100 GPU’s are insane. If the marketing copy is at all to be believed, they’ll make the training times of massively large models much smaller. It offers 900 GB/s of GPU to GPU transfer. They can also perform up to 2000 trillion floating point operations per second, with each GPU having up to 80GB of memory. With these statistics, you’ll only need about 2500 of these GPUs to handle training GPT-4, and given the size of Microsoft and Nvidia, and the success that OpenAI has had so far, you can be sure that the pocketbooks will be staying wide open. This is why I’m quite bullish on GPT-4 - while the number sounds massive, it’s entirely possible for Nvidia to produce 10k or even 100k GPUs for a super computer, at which point, it becomes entirely within the realm of human possibility. If its true that GPT-4 will be as good as most humans, then it’ll be absolutely absurd .
Not to be outdone, the other company besides Nvidia doing large scale AI inference, Cerebras, also announced their supercomputer, Andromeda, which offers 1 Exaflop of AI compute, which is also absurd. Intel, AMD, Google with their TPUs, and even IBM are all releasing chips recently to perform AI inference quicker. It’s getting to be interesting to see how much money is being spent on this too.
There’s also companies like Tenstorrent, which while it can’t be sure, since the CTO is an advisor for Midjourney, probably powers Midjourney. Then there’s smaller startups like Axlera.AI, which report having achieved 39.3 tera operations per second (TOP/s) with an efficiency of 14.1 TOPs/W at an INT8 precision in less than 9 square millimeters, which is absurd - that’s more than fast enough to run inference on most reasonably sized models. Currently, at a glance(I might be wrong here), there don’t appear to be any big companies doing what our lab is doing, namely sparse neural network ASIC accelerators, but there is a company NeuralMagic, which also does sparsity processing to speed up CPU inference of models.
As an aside, sparse neural networks are networks from which most of the weights have been removed, leading to a significantly reduced parameter count and therefore inference time, but only if the hardware supports sparse workloads.
Now, speaking from the research side, given the exponential growth of AI papers recently, there’s been an explosion of papers on neural network accelerators. Especially for sparse neural network accelerators(which as said before is our labs focus), there’s new architectures like Cnvlutin, SCNN accelerator, and Cambricon-S accelerators, which all claim to be much faster than the benchmark for most ML tests. These are custom chips not based on GPUs for specifically machine learning applications. It’ll take a few years for them to get into the development and deployment phase, but once they do, it’ll be another step in the development of AI, since these chips, by their design, will be significantly better at AI evaluation - afterall, GPUs were designed for games first, and were then jerry rigged into AI research.
These developments and the massive amount of VC funding going into AI - $48.2 billion in 2022 so far, not counting corporate expenditure - means that the AI acceleration is probably not slowing down anytime soon. In 2084, there might be a AI chip in every device, and we’ll talk about TPUs or AIPUs, like a GPU - a must for any PC bigger than a toaster.