Skip to content
AristoAiStack
Go back
Illustration for the article: OpenAI Codex Spark: 1,000 Tokens/Sec on Cerebras

OpenAI Codex Spark: 1,000 Tokens/Sec on Cerebras

3 min read

One thousand tokens per second. That’s roughly 750 words of code generated every second — faster than you can read it.

OpenAI just released GPT-5.3-Codex-Spark, a smaller, speed-optimized coding model running on Cerebras’ Wafer Scale Engine 3 — a chip the size of a dinner plate. It’s the first product from OpenAI’s $10 billion Cerebras partnership, and it signals a fundamental shift in how AI coding tools will work.

⚡ Skip to our verdict →

TL;DR

  • GPT-5.3-Codex-Spark delivers 1,000+ tokens/sec on Cerebras hardware — 6x faster than OpenAI’s fastest Nvidia-based models
  • Same accuracy, fraction of the time: Matches GPT-5.3-Codex on SWE-Bench Pro in 2-3 minutes vs. 15-17 minutes
  • Available now for ChatGPT Pro users in the Codex app, CLI, and VS Code extension
  • Strategic play: OpenAI is systematically reducing Nvidia dependence (AMD deal, Amazon deal, custom chips, now Cerebras)

What Just Happened

OpenAI released Codex-Spark on February 12, 2026 — the first model built specifically for Cerebras’ wafer-scale inference hardware. While OpenAI’s previous Nvidia-based models top out around 147-167 tokens/sec (per Artificial Analysis benchmarks), Codex-Spark blows past that at 1,000+ tokens/sec.

This didn’t happen by accident. OpenAI rewrote key parts of its inference stack to hit these numbers:

  • Per-roundtrip overhead: Down 80%
  • Per-token overhead: Down 30%
  • Time-to-first-token: Cut in half

These infrastructure improvements will roll out to all OpenAI models soon — meaning everything gets faster, not just Spark.

Speed Comparison (tokens/sec)

Codex-Spark (Cerebras) 1000/1000
o3-mini (Nvidia) 167/1000
GPT-4o (Nvidia) 147/1000
GPT-4o mini (Nvidia) 52/1000

The Speed vs. Accuracy Tradeoff

Codex-Spark isn’t the smartest model — it’s the fastest useful one. Here’s the benchmark reality:

SWE-Bench Pro (agent-based coding tasks):

  • Codex-Spark: Similar accuracy to GPT-5.3-Codex, completed in 2-3 minutes
  • GPT-5.3-Codex: Same accuracy, takes 15-17 minutes

Terminal-Bench 2.0:

  • GPT-5.3-Codex: 77.3% accuracy (the smart one)
  • Codex-Spark: 58.4% accuracy (the fast one)
  • GPT-5.1-Codex-mini: 46.1% accuracy (last gen)

Terminal-Bench 2.0 Accuracy

GPT-5.3-Codex 77.3/100
Codex-Spark 58.4/100
GPT-5.1-Codex-mini 46.1/100

The design philosophy is deliberate: Spark makes minimal, targeted changes by default. It won’t run tests unless asked. It’s built for pair programming, not autonomous multi-hour coding sessions.

Why This Matters

1. Speed Changes the Workflow, Not Just the Wait Time

At 1,000 tokens/sec, you stop thinking of AI as a tool you query and start treating it as a collaborator you interrupt. The mental model shifts from “submit prompt → wait → review” to real-time conversation. That’s a fundamentally different way to code.

2. The Nvidia Diversification Play Is Real

This is the latest move in OpenAI’s systematic Nvidia decoupling:

  • October 2025: Multi-year AMD chip deal
  • November 2025: $38 billion Amazon cloud agreement
  • January 2026: $10 billion Cerebras partnership
  • Ongoing: Custom TSMC chip design

For developers, this means: AI inference will get cheaper and faster as competition among chip providers heats up.

3. Two-Mode Coding Is Coming

OpenAI confirmed they’re building toward two complementary modes: real-time collaboration (Spark) and extended autonomous reasoning (Codex). The plan is to merge them — fast interactive loops with background sub-agents handling complex tasks in parallel.

This mirrors how Claude Code already works with its “background agent” approach, and it’s where all coding assistants are headed.

The Verdict
Codex-Spark

OpenAI's Codex-Spark trades ~20% accuracy for 5-8x speed. Best for interactive coding, rapid prototyping, and real-time pair programming. Use full Codex for complex autonomous tasks.

Speed 10
Accuracy 7
Availability 5

What This Means for You

If You’re a Developer

Codex-Spark is purpose-built for your workflow. At this speed, it’s viable for:

  • Live code reviews — paste a function, get feedback before your coffee cools
  • Rapid prototyping — iterate on UI components in real-time
  • Interactive debugging — describe the bug, get targeted fixes instantly

The catch: it’s ChatGPT Pro only ($200/mo) with separate rate limits. If you’re already on Pro for Codex, you get Spark for free. If not, the ROI math is: does saving 12-15 minutes per complex coding task justify the subscription?

If You’re a Business Decision-Maker

Three things to watch:

  1. Developer productivity is about to jump again. Speed improvements compound — faster iteration means more experiments, faster shipping.
  2. Cost pressure is coming down. Cerebras inference, AMD competition, and custom chips all push prices lower.
  3. The coding agent market is consolidating. OpenAI, Anthropic, and Google are all building full coding agent ecosystems. Pick one and go deep rather than spreading across three.

What to Do Next

  1. If you have ChatGPT Pro: Try Codex-Spark in VS Code today. Use it for interactive tasks; keep full Codex for complex refactors. The speed difference will change how you work.

  2. If you’re evaluating coding tools: Compare Spark’s real-time speed against Claude Code and Cursor. Speed isn’t everything — code quality and context understanding matter too. Run the same task on each and time it.

  3. If you’re planning infrastructure: Watch the Cerebras partnership closely. If OpenAI rolls these speed improvements to all models (they said they will), expect API latency to drop significantly in Q2 2026. Plan your integrations accordingly.


This article was published on February 17, 2026. OpenAI’s Codex-Spark is in Research Preview and rate limits may change. We’ll update this article as general availability expands.