A top-down view of the Cerebras Wafer-Scale Engine 3, a dinner-plate-sized chip next to a standard Nvidia H100 for scale.

Cerebras Files for IPO: The $20B OpenAI Bet on Wafer-Scale Compute

Cerebras Systems just filed its S-1, and the numbers are a gut punch to the “Nvidia monopoly” narrative. With a $10B+ OpenAI contract and a surprise flip to GAAP profitability, wafer-scale compute is no longer a science project—it’s a production reality.

After a year of regulatory delays and a withdrawn filing in 2025, Cerebras has returned to the public markets with a registration statement that fundamentally changes how we view the AI hardware landscape. The company, which builds the dinner-plate-sized Wafer-Scale Engine (WSE), is no longer just a niche player for national labs. According to the new S-1 filing, Cerebras brought in $510 million in revenue in 2025, netting a GAAP profit of $87.9 million. This is a staggering reversal from its 2024 net loss of $481 million and places it in a rare class of profitable AI startups heading into an IPO.

The OpenAI Whale and the $20B Bet

The headline shocker in the filing is the disclosure of a massive, multi-year compute partnership with OpenAI. While rumors of Sam Altman seeking chip alternatives have circulated for years, the scale of this deal is unprecedented. The agreement reportedly involves the deployment of 750 megawatts of Cerebras wafer-scale systems to serve OpenAI customers, a deal valued at over $10 billion through 2030, with some sources suggesting the total value could reach $20 billion or more.

Crucially, the deal includes warrants that could give OpenAI up to a 10% equity stake in Cerebras if cumulative spending reaches $30 billion. This isn’t just a vendor relationship; it’s a strategic hedge. OpenAI is effectively building a parallel infrastructure to Nvidia, specifically optimized for low-latency inference. As Sachin Katti of OpenAI noted, the goal is to enable “faster responses, more natural interactions, and a stronger foundation to scale real-time AI.”

Technical Deep Dive: WSE-3 vs. Blackwell B200

For practitioners, the core question is whether the Wafer-Scale Engine 3 (WSE-3) actually beats Nvidia’s Blackwell in the wild. The architectural philosophies couldn’t be more different. Nvidia is the king of “scale-out”—connecting thousands of small GPUs via NVLink. Cerebras is the champion of “scale-up”—putting an entire data center’s worth of compute onto a single piece of silicon to eliminate the “tax” of moving data between chips.

Specification Cerebras WSE-3 (Single Wafer) NVIDIA DGX B200 (8x GPUs) Advantage
Transistors 4 Trillion ~1.6 Trillion Cerebras (2.5x)
AI Cores 900,000 ~160,000 Cerebras
Peak AI Compute (FP16) 125 PetaFLOPS 36 PetaFLOPS Cerebras (3.5x)
On-Chip Memory (SRAM) 44 GB ~400 MB (L2 Cache) Cerebras (100x+)
Memory Bandwidth 21,000 TB/s 64 TB/s Cerebras (328x)
System Memory 1.2 PB (MemoryX) 1.5 TB HBM3e Cerebras (Capacity)

According to Cerebras’s own benchmarks, the WSE-3 delivers up to 15x faster inference for large language models like Llama 3.1 compared to GPU-based systems. This speed comes from the 44GB of on-chip SRAM, which allows the entire model state to reside on-wafer, bypassing the HBM bottlenecks that plague Nvidia’s architecture during the “decode” phase of inference.

The “Inference Disaggregation” Play with AWS

Beyond the OpenAI deal, Cerebras is making a sophisticated move into the cloud via Amazon Web Services (AWS). The partnership introduces a concept called “inference disaggregation.”

In a standard inference request, there are two phases:

  1. Prefill: Processing the input prompt (compute-bound).
  2. Decode: Generating the output tokens one by one (memory-bandwidth bound).

AWS and Cerebras are deploying a hybrid solution where AWS Trainium chips handle the prefill, and Cerebras CS-3 systems handle the high-speed decode. This stack, accessible via Amazon Bedrock, aims to provide the fastest inference in the cloud. By splitting the workload, AWS can leverage its existing infrastructure for the heavy lifting of prompt processing while using Cerebras as a “turbocharger” for the generation phase. This is a direct shot at Nvidia’s NIM (Nvidia Inference Microservices) ecosystem.

The “UAE Shell Game” and Customer Concentration

Despite the bullish financials, skeptics are pointing to a potential “shell game” regarding customer concentration. In 2024, Cerebras was heavily criticized for relying on G42 (an Abu Dhabi-based firm) for 87% of its revenue. The new filing shows G42’s share dropped to 24% in 2025. However, a new customer—the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)—now accounts for 62% of revenue.

Critics on Hacker News and X argue that since both G42 and MBZUAI are Abu Dhabi-based entities with overlapping leadership, the “diversification” is largely cosmetic. The company still effectively relies on the UAE for over 80% of its current cash flow. This concentration was the primary reason for the CFIUS national security review that stalled the IPO in 2024. While Cerebras reportedly obtained clearance in early 2025, the geopolitical risk remains a significant overhang for institutional investors.

What This Means for the Market

The Cerebras IPO (ticker: CBRS) is the first real test of whether the market believes in specialized AI silicon over general-purpose GPUs. If Cerebras can maintain its profitability while scaling the OpenAI deployment, it proves that the “Nvidia tax” is optional for high-scale inference providers.

For engineers, the takeaway is clear: the era of “just throw more H100s at it” is ending. We are entering a multi-polar hardware environment where the choice of chip depends entirely on the workload. If you need massive throughput for training a foundation model from scratch, Nvidia’s Blackwell NVL72 cluster remains the gold standard. But if you are building real-time agents, coding assistants, or voice interfaces where latency is the product, wafer-scale compute is now the performance ceiling.

Takeaways

  • Profitability is the new signal: Cerebras’s $87.9M GAAP profit proves that AI hardware startups can actually make money, provided they have a “whale” customer and a differentiated architecture.
  • OpenAI is diversifying: The $10B-$20B deal suggests OpenAI is no longer willing to be 100% dependent on Nvidia’s roadmap or pricing.
  • Inference is the battleground: The AWS partnership highlights a shift toward “disaggregated inference,” where different chips handle different parts of the LLM lifecycle.
  • Geopolitical risk is baked in: The heavy reliance on UAE-based revenue (MBZUAI/G42) remains the biggest threat to the stock’s long-term stability.
  • Wafer-scale is real: 15x inference speedups aren’t just marketing; they are the result of 44GB of on-chip SRAM and 21,000 TB/s of bandwidth that Nvidia simply cannot match with a discrete GPU architecture.

Cerebras is expected to price its IPO in mid-May 2026. Whether it becomes the “AMD of AI” or remains a high-performance boutique will depend on how fast it can convert the OpenAI warrants into actual deployed silicon.

Leave a Comment

Your email address will not be published. Required fields are marked *