In the world of semiconductor manufacturing, there has long been a prevailing belief that larger chips inevitably lead to lower yields. However, Cerebras Systems has defied this conventional wisdom by successfully creating and commercializing a chip that is a staggering 50 times larger than the largest existing computer chips while maintaining comparable yields. This achievement has naturally prompted a frequently asked question: how does Cerebras manage to attain a usable yield with its wafer-scale processor?
The answer lies in rethinking the traditional relationship between chip size and fault tolerance. In this article, we will delve into a detailed comparison of manufacturing yields between the Cerebras Wafer Scale Engine (WSE) and a chip the size of the Nvidia H100, both of which are manufactured at the advanced 5nm process node. By examining the interplay between defect rates, core size, and fault tolerance, we will explore how Cerebras achieves wafer-scale integration with yields that are equal to or better than those of reticle-limited GPUs.
### Understanding Yield in Semiconductor Manufacturing
Every manufacturing process has its share of defects, and the production of computer chips is no exception. Larger chips are more susceptible to defects, and as the size of the chips increases, yields tend to decrease exponentially due to the expanding die area. While larger chips often offer faster performance, early microprocessors were intentionally designed to be modest in size to ensure acceptable manufacturing yields and profit margins.
This paradigm began to shift in the early 2000s. As transistor budgets exceeded 100 million, manufacturers started building processors with multiple independent cores per chip. These cores were identical and independent, allowing chip designers to incorporate core-level fault tolerance. This means that if one core had a defect, the remaining cores could still function properly. For instance, in 2006, Intel introduced the Intel Core Duo, a chip with two CPU cores. If one core was found to be faulty, it was simply disabled, and the product was marketed as an Intel Core Solo. In subsequent years, companies like Nvidia and AMD embraced this concept of core-level redundancy.
In today’s high-performance processors, fault tolerance is a standard feature, allowing manufacturers to sell chips even with some disabled cores. AMD and Intel CPUs typically offer a flagship version with all cores enabled and a lower-end version with a portion of the cores disabled. Nvidia’s data center GPUs are significantly larger than CPU dies, and even its flagship models have some cores disabled to maintain yield.
Take the Nvidia H100, for example, which is an enormous GPU with a die size of 814mm². Traditionally, such a large chip would be difficult to yield economically. However, its cores, known as Streaming Multiprocessors (SMs), are fault tolerant. This means that a manufacturing defect does not render the entire product unusable. The H100 physically contains 144 SMs, but the commercialized product only has 132 active SMs. This configuration allows the chip to withstand numerous defects across 12 SMs and still be sold as a flagship product.
### Defect Tolerance: The Key to High Yields
Historically, chip size directly determined chip yields. In modern times, however, yield depends on both chip size and defect tolerance. Chips as large as 800mm² were once considered commercially unviable due to yield challenges. Yet, with defect-tolerant design approaches, these have become mainstream products.
The level of defect tolerance can be gauged by the amount of chip area that becomes unusable when a defect occurs. For multi-core chips, smaller cores mean greater defect tolerance. If individual cores are small enough, it is feasible to construct a very large chip.
### Cores of the Wafer Scale Engine
At Cerebras, before embarking on the construction of a wafer-scale chip, the company first designed a remarkably small core. Each AI core in the Wafer Scale Engine (WSE) 3 is approximately 0.05mm², which is about 1% the size of an H100 SM core. Both core designs incorporate fault tolerance. This means that a defect in a WSE core would disable only 0.05mm² of silicon, whereas the same defect in an H100 would disable approximately 6mm². This makes the wafer-scale engine roughly 100 times more fault tolerant than a GPU when considering the silicon area impacted by each defect.
### The Advanced Routing Architecture
However, having small cores alone does not ensure success. Cerebras has developed a sophisticated routing architecture that allows for dynamic reconfiguration of connections between cores. When a defect is detected, the system can automatically reroute around it using redundant communication pathways. This preserves the chip’s overall computational capabilities by leveraging nearby cores.
This routing system works in tandem with a small reserve of spare cores that can be used to replace defective units. Unlike previous approaches that required substantial redundancy overhead, Cerebras’s architecture achieves high yield with minimal spare capacity through intelligent routing.
### A Wafer-Scale Walkthrough
To better understand defect tolerance at the chip level, let’s compare how a traditional GPU and a wafer-scale chip would yield using TSMC’s 5nm 300mm wafer.
Consider a GPU similar to the H100: it measures 814mm², has 144 fault-tolerant cores, and a single 300mm wafer yields 72 full-die chips. On the other hand, the Cerebras Wafer Scale Engine 3 is a single, massive square measuring 46,225mm², with 970,000 fault-tolerant cores. One wafer yields one chip.
According to TSMC’s process, the defect density is approximately 0.001 defects per mm². The total die area of 72 GPU dies is 58,608mm². Applying this defect density, this area would experience 59 defects. For simplicity, let’s assume each defect affects a separate core. At 6.2mm² per core, this translates to 361mm² of die space lost due to defects.
In the case of the Cerebras Wafer Scale Engine, the effective die size is smaller at 46,225mm². Applying the same defect rate, the WSE-3 would encounter 46 defects. Given that each core is 0.05mm², this results in a total loss of 2.2mm² to defects.
When measuring the total area lost, the GPU loses 164 times more silicon area than the Wafer Scale Engine on a comparable basis at the same manufacturing node and defect rate.
While this gives a high-level overview, it simplifies a few details. Not all areas of the chip are occupied by compute cores. Components like caches, memory controllers, and on-chip fabric consume a significant portion of die size, potentially up to 50%. However, these components can also be designed to be fault tolerant in their own right. An H100 SM is likely smaller than 6.2mm² due to these components, though not by an order of magnitude. In practice, even fault-tolerant chips will not yield close to 100%. Despite these caveats, the general rule that smaller cores provide greater fault tolerance still holds.
### Cerebras: Leading the Way in Wafer-Scale Computing
When comparing Nvidia’s data center GPUs with the WSE-3, Cerebras has designed its product to be fault-tolerant, disabling a portion of its cores to manage yield. Thanks to the tiny size of the cores, the number of cores in the WSE-3 is extraordinarily large—970,000 physical cores, with 900,000 active in the current shipping product. This provides exceptional, fine-grained defect tolerance. Despite creating the world’s largest chip, Cerebras enables 93% of its silicon area, which is higher than the leading GPU on the market today.
In conclusion, Cerebras has solved the wafer-scale manufacturing challenge by designing a small, fault-tolerant core alongside a fault-tolerant on-chip fabric. While the total chip area increased by approximately 50 times compared to conventional GPUs, the individual core size was reduced by about 100 times. As a result, defects pose far less of a threat to the WSE than to conventional multi-core processors. The third-generation WSE engine achieves 93% silicon utilization, the highest among leading AI accelerators, proving that wafer-scale computing is not only possible but commercially viable at scale.
For more Information, Refer to this article.