OpenAI has officially released GPT-5.5 (codenamed “Spud”), a model they are framing not just as a text generator, but as a frontier engine for agentic workflows. While the model is rolling out immediately to ChatGPT Plus, Pro, and Enterprise users, the standard API release is notably delayed, forcing developers to look toward the newly open-sourced Codex CLI for programmatic access.
The Concrete News: Intelligence Over Speed
GPT-5.5 arrives just a month after GPT-5.4, signaling an aggressive release cadence from OpenAI. According to OpenAI’s official announcement, the model matches the per-token latency of 5.4 but delivers a significant leap in “conceptual clarity.” The core value proposition here isn’t raw speed; it’s efficiency. Early testers and OpenAI’s own data suggest the model requires approximately 40% fewer output tokens to complete complex coding tasks because it “thinks” more before it speaks.
Key performance metrics from the release include:
- SWE-Bench Pro: While Anthropic’s Claude Mythos Preview currently leads with a score of 0.778, GPT-5.5 is closing the gap in real-world software engineering tasks.
- Vulnerability Detection: Security firm XBOW reports that GPT-5.5 reduced their vulnerability miss rate to just 10%, compared to 40% for GPT-5 and 18% for Claude Opus 4.6.
- Context Window: Maintains a 1M token flat-rate window, avoiding the 2x price surcharges Anthropic often applies for prompts over 200K tokens.
The “Backdoor” API and Codex CLI
One of the most interesting aspects of this launch is the “semi-official” support for the Codex API. Because the standard API is still behind a “coming soon” gate, developers are using the /backend-api/codex/responses endpoint. This was previously considered a grey-area “backdoor,” but OpenAI’s Romain Huet confirmed on X that this is an officially supported method for ChatGPT Plus/Pro subscribers to use their models in external tools like JetBrains, Xcode, and the open-source OpenClaw.
To facilitate this, OpenAI has open-sourced the Codex CLI, a Rust-based terminal agent. This move follows the hiring of Peter Steinberger, creator of OpenClaw, to lead OpenAI’s personal agent efforts.
How to try it (The Simon Willison Method)
If you want to bypass the ChatGPT UI and use GPT-5.5 in your terminal today, you can use the llm-openai-via-codex plugin developed by Simon Willison.
- Install the Codex CLI:
npm install -g @openai/codex
codex # Follow the prompts to sign in with your ChatGPT Plus account
- Install the LLM tool and plugin:
uv tool install llm
llm install llm-openai-via-codex
- Run a prompt:
llm -m openai-codex/gpt-5.5 "Refactor this React component to use the new useActionState hook"
Pricing and Economics
When the API does land, it will be expensive. GPT-5.5 is priced at $5.00 per 1M input tokens and $30.00 per 1M output tokens. This is exactly double the cost of GPT-5.4 ($2.50/$15.00).
| Model | Input (1M) | Output (1M) | Context |
|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | 1M |
| Claude 4.7 Opus | $5.00 | $25.00 | 1M |
| GPT-5.4 | $2.50 | $15.00 | 1.05M |
NVIDIA is already touting the efficiency of running these models on their GB200 NVL72 systems, claiming a 35x lower cost per million tokens compared to previous generations, which may eventually lead to price cuts as infrastructure scales.
Competitive Landscape
The release is a direct shot at Anthropic’s Claude 4.7 and the Mythos cybersecurity preview. While Anthropic has been criticized for “rug-pulling” developers by blocking third-party harnesses like OpenClaw from using web subscriptions, OpenAI is doing the opposite—inviting developers to use their $20/month subscription as a programmatic engine via the Codex CLI.
Takeaways
- Agentic over Chat: GPT-5.5 is optimized for multi-step planning and tool use, not just answering questions.
- Subscription Arbitrage: Using the Codex CLI allows you to run GPT-5.5 for the cost of a ChatGPT Plus sub, avoiding the $30/1M output token API sting for now.
- Security Gains: The model shows a massive leap in finding vulnerabilities, making it a viable tool for offensive and defensive security teams.
- Efficiency Matters: The 40% reduction in tokens for the same task means that even if the per-token price is higher, the per-task cost might be closer to parity with older models.