2084: Hardware

Asynchronously going about in 3 dimensions.

Nov 25, 2022

Recently, there was a new type of transistor architecture announced: RibbonFET, which is a rearrangement of the transistor gates and channels in such a way that the gate and channel can be more continuously “sized”, thus allowing the user to choose the size configuration that optimizes power and efficiency.

A blue block pierced by three gold-coated ribbons all atop a thicker grey block. — RibbonFET

It essentially consists of “ribbons” of silicon stacked going through and surrounded by the gate. Since each ribbon is planar, it allows continuous sizing, in contrast to the current industry standard, FinFET, which is quantized, but with similar performance characteristics.

Blocks of grey, silver, and black with a stripe of gold dots on one and a fin-shaped structure of gold dots on the other. — FinFET

FinFET is where you have a “fin” of a channel stick out of the surface and be covered by the gate, which is more efficient in carrying charge, but has the disadvantage that you can’t continuously size the fin - only add more fins - which leads to a decrease in efficiency.

But this is just the start of their stacking ambitions. They’ve worked on and shown a 3D gate: They stacked a NFET and PFET transistor on top of each other to create an inverter using half the area with comparable electrical characteristics to a normal inverter. This could mean that Moore’s law could still be disproved - there could be a doubling or more of speed as more transistors are fit into a single wafer, and so the relentless march of compute will not stop. It’s exciting.

Orange elongated blocks connect to several narrower blocks of a variety of colors. — Stacked

There’s also work on different types of transistors. IBM has been working recently modifying “phase change materials” to simulate synapse-like electrical activity to support neuromorphic(brain-like) algorithms for detecting objects using less area and less power. They’ve used the short-term electrostatic volatility and long-term non-volatility of phase change to simulate the short term electrical volatility and long term plasticity of synapses to do practical tasks. It promises to be able to perform AI like tasks with less space and less power, making AI more and more available to everyone.

Along with these stacked transistors, all kinds of companies are developing newer and newer neural network accelerators and other new forms of chip architectures. IBM, not content with developing 2nm transistors and the above neuromorphic networks, have released their own AI chip, for making neural network inference and training faster. It’s quite similar to Google’s old TPU in its matrix multiply networks, but uses interesting bit formats to take advantage of AI’s need for only “approximate” correctness - an interesting topic in of itself, since it is a not widely known fact that you can generally reduce the resolution of neural network parameters from 32 bit floats to 8 bit ints without losing too much accuracy for a 4 fold decrease in space, indicating the fundamental robustness and also overparameterization of most AI models.

Beyond TPU and similar matrix multiply chips, there are also various accelerators for speeding up specific neural network operations. Using tile-flexible and other paradigms, researchers have designed accelerators which can speedily run sparse models, models which can be created by methods like Learning Rate Rewinding. These sparse models promise a 10 times speedup over dense models in less space to boot - the dream is to one day be able to run GPT-3 or Stable Diffusion on a $20 chip, a dream which might become a reality soon(I’m hoping to be able to figure that out soon). There’s even work on Fourier transform based models of convolutional neural networks(for of course, convolution in time is multiplication in frequency) for even larger speedups.

In addition, there are new chip architecture paradigms which are being developed. My lab, the AVLSI lab at Yale specializes in asynchronous architecture, which is a specific type of chip architecture which has no clock for synchronization: only local synchronization between cells(which is what you call combination of gates). Since then each component can run as fast as it can, and there’s no global effects due to the absence of a clock, it promises higher speed, efficiency and easier debugging. In addition it can support ultra low power applications, as there’s no need for a power intensive clock. The only issue is that it’s not a paradigm supported by most chip design tools, so the lab has had to develop a whole suite of tools to support it, ACT. But it’s a growing field.

So out of the weeds of scientific description, what all this leads towards is a new and exciting world of ever smaller, ever more efficient, ever cheaper chips for running massively complex programs. The end of Moore’s law is premature, and in the future we could have more and more compute everywhere. In 2084, the AI models which will saturate the world around us won’t run on a massive supercomputer like Skynet, but rather will be run in small compact devices, and be embedded in chips in every device and computer, small chips which will be able to run massive models with minimum power consumption. It’ll be a decentralized revolution to a large degree. There will be supercomputers of highly advanced AI chips which will run massive models, but on a day to day basis, you will interact with the smaller models run on small cheap chips.

2084

2084: Hardware

Asynchronously going about in 3 dimensions.

Discussion about this post