Cerebras Wafer Revealed Rare 2.6 Trillion-Transistor CPU With 850,000 Cores
by Rustam Iqbal
The global craft Companies like Cerebras have made headlines in the last year for their use of wafer-scale manufacturing. TSMC wants to expand this area of its business and plans to develop its InFO SoW (Integrated Fan-Out Silicon on Wafer) technology to build future supercomputer-class AI processors.
TSMC has already contracted it to build its wafer-scale processors. The company also has an eye on the broader market and claims that wafer-scale processing will prove appalling to other stakeholders apart. The company has heralded an announcement that it will develop 16 nm technology on these chips.
Not a big deal of years ago, we resumed overlooking researchers who are busy exploring an old manufacturing concept first discussed in the 1980s: processing on a wafer scale. WSP 's idea is very much a naive and straightforward deal: Rather of delving down a wafer into individual separatist chips and then stacking those chips for resale sellers, establishing the blocks of a single core or accumulating core using a lot, if not all, of the wafer for one chip.
The "old" Cerebras Wafer Scale Engine (CWSE) was a 16 nm wafer with 400,000 AI cores, 1.2 T transistors, onboard memory 18 GB, total memory bandwidth 9PB / s, and full fabric bandwidth 100Pb / s. CWSE's latest edition is reportedly even more prominent and broader in scale.
That is much and a lot of transistors. And even of the nuclei. And potentially power consumption, although data from other companies provided indications that the metamorphosis from 16nmFF to 7 nm enables some substantial power savings. The next-generation wafer-scale engine from it will demonstrate additional enhancement details at Hot Chips today, but it has been one of the AI markets coming down peripheral love up till now.
It grasps a nanoscopic approach to a nanoscopic dilemma. Metaphorically, both chipsets and wafer-scale designs attempt to find a resolution to the issues of modern computing regarding packaging and power efficiency. Chiplets focus on the enlightenment of the dead area and breaking up a processor into functional units, making the most sense for the process nodes in which they are installed.
Chiplets reject Moore's longstanding law principle that integration is always better and concentrate on integrating components where the solution is still relevant. Wafer-scale processing still deals with inclusion — but from the system's perspective as a complete framework, it bypasses a lot of integration as well in favor of fundamentally different functional block assimilation.
A CPU or GPU is typically a much smaller piece of silicone (limited by the factory’s maximum reticle size, if nothing else) bonded to a block prepared upon a PCB. An integrated device is just a single motherboard mounting 4-10 accelerator boards, and one with plenty of inefficiencies within it until you start considering wire delay and overall power consumption. It overreaches the commodity production periphery by sewing cores together, which would usually preclude a wafer-scale processor from being accessible and comfortable in the first hand. For both cases, a new approach to integration generated substantial changes where the same old way of doing things has started failing us.
There is one panoramic variance between chiplets and WSE: chipsets are in first hand accessible in consumer goods from AMD, while wafer-scale engines draw ~15kW of power and are available only for personal installation in your evil lair or moon base.