One thousand tokens per second. That’s roughly 750 words of code generated every second — faster than you can read it.
OpenAI just released GPT-5.3-Codex-Spark, a smaller, speed-optimized coding model running on Cerebras’ Wafer Scale Engine 3 — a chip the size of a dinner plate. It’s the first product from OpenAI’s $10 billion Cerebras partnership, and it signals a fundamental shift in how AI coding tools will work.
TL;DR
- GPT-5.3-Codex-Spark delivers 1,000+ tokens/sec on Cerebras hardware — 6x faster than OpenAI’s fastest Nvidia-based models
- Same accuracy, fraction of the time: Matches GPT-5.3-Codex on SWE-Bench Pro in 2-3 minutes vs. 15-17 minutes
- Available now for ChatGPT Pro users in the Codex app, CLI, and VS Code extension
- Strategic play: OpenAI is systematically reducing Nvidia dependence (AMD deal, Amazon deal, custom chips, now Cerebras)
What Just Happened
OpenAI released Codex-Spark on February 12, 2026 — the first model built specifically for Cerebras’ wafer-scale inference hardware. While OpenAI’s previous Nvidia-based models top out around 147-167 tokens/sec (per Artificial Analysis benchmarks), Codex-Spark blows past that at 1,000+ tokens/sec.
This didn’t happen by accident. OpenAI rewrote key parts of its inference stack to hit these numbers:
- Per-roundtrip overhead: Down 80%
- Per-token overhead: Down 30%
- Time-to-first-token: Cut in half
These infrastructure improvements will roll out to all OpenAI models soon — meaning everything gets faster, not just Spark.
The Speed vs. Accuracy Tradeoff
Codex-Spark isn’t the smartest model — it’s the fastest useful one. Here’s the benchmark reality:
SWE-Bench Pro (agent-based coding tasks):
- Codex-Spark: Similar accuracy to GPT-5.3-Codex, completed in 2-3 minutes
- GPT-5.3-Codex: Same accuracy, takes 15-17 minutes
Terminal-Bench 2.0:
- GPT-5.3-Codex: 77.3% accuracy (the smart one)
- Codex-Spark: 58.4% accuracy (the fast one)
- GPT-5.1-Codex-mini: 46.1% accuracy (last gen)
The design philosophy is deliberate: Spark makes minimal, targeted changes by default. It won’t run tests unless asked. It’s built for pair programming, not autonomous multi-hour coding sessions.
Why This Matters
1. Speed Changes the Workflow, Not Just the Wait Time
At 1,000 tokens/sec, you stop thinking of AI as a tool you query and start treating it as a collaborator you interrupt. The mental model shifts from “submit prompt → wait → review” to real-time conversation. That’s a fundamentally different way to code.
2. The Nvidia Diversification Play Is Real
This is the latest move in OpenAI’s systematic Nvidia decoupling:
- October 2025: Multi-year AMD chip deal
- November 2025: $38 billion Amazon cloud agreement
- January 2026: $10 billion Cerebras partnership
- Ongoing: Custom TSMC chip design
For developers, this means: AI inference will get cheaper and faster as competition among chip providers heats up.
3. Two-Mode Coding Is Coming
OpenAI confirmed they’re building toward two complementary modes: real-time collaboration (Spark) and extended autonomous reasoning (Codex). The plan is to merge them — fast interactive loops with background sub-agents handling complex tasks in parallel.
This mirrors how Claude Code already works with its “background agent” approach, and it’s where all coding assistants are headed.
OpenAI's Codex-Spark trades ~20% accuracy for 5-8x speed. Best for interactive coding, rapid prototyping, and real-time pair programming. Use full Codex for complex autonomous tasks.
What This Means for You
If You’re a Developer
Codex-Spark is purpose-built for your workflow. At this speed, it’s viable for:
- Live code reviews — paste a function, get feedback before your coffee cools
- Rapid prototyping — iterate on UI components in real-time
- Interactive debugging — describe the bug, get targeted fixes instantly
The catch: it’s ChatGPT Pro only ($200/mo) with separate rate limits. If you’re already on Pro for Codex, you get Spark for free. If not, the ROI math is: does saving 12-15 minutes per complex coding task justify the subscription?
If You’re a Business Decision-Maker
Three things to watch:
- Developer productivity is about to jump again. Speed improvements compound — faster iteration means more experiments, faster shipping.
- Cost pressure is coming down. Cerebras inference, AMD competition, and custom chips all push prices lower.
- The coding agent market is consolidating. OpenAI, Anthropic, and Google are all building full coding agent ecosystems. Pick one and go deep rather than spreading across three.
What to Do Next
-
If you have ChatGPT Pro: Try Codex-Spark in VS Code today. Use it for interactive tasks; keep full Codex for complex refactors. The speed difference will change how you work.
-
If you’re evaluating coding tools: Compare Spark’s real-time speed against Claude Code and Cursor. Speed isn’t everything — code quality and context understanding matter too. Run the same task on each and time it.
-
If you’re planning infrastructure: Watch the Cerebras partnership closely. If OpenAI rolls these speed improvements to all models (they said they will), expect API latency to drop significantly in Q2 2026. Plan your integrations accordingly.
This article was published on February 17, 2026. OpenAI’s Codex-Spark is in Research Preview and rate limits may change. We’ll update this article as general availability expands.



