OpenAI has officially moved beyond the chat box with the launch of Agent Mode, a general-purpose AI agent integrated directly into ChatGPT. This isn’t just another model update; it is a unified execution environment that combines the web-browsing muscle of Operator with the analytical depth of Deep Research to perform multi-step digital labor on your behalf.
Announced on July 17, 2025, the tool is rolling out to Plus, Pro, and Team users, with Enterprise and Education access following shortly. By toggling a switch or typing /agent, users can delegate end-to-end workflows—like planning a dinner party, conducting competitive market research, or building a 10-slide PowerPoint deck—while the agent operates a cloud-hosted virtual computer to get the job done.
The Architecture: CUA and Virtual Computers
Under the hood, Agent Mode is powered by a new Computer-Using Agent (CUA) architecture. Unlike traditional LLMs that interact with the world via text APIs, this model “sees” a computer screen through a stream of screenshots and interacts using a virtual mouse and keyboard.
According to OpenAI’s system card, the system utilizes four primary tools to execute tasks:
- Visual Browser: A remote environment where the agent can click, scroll, and log into websites.
- Code Interpreter: A sandboxed Python environment for data crunching.
- Terminal: A command-line interface with limited network access for file manipulation and system tasks.
- Connectors: Direct API integrations for apps like Gmail, Google Drive, and GitHub.
This combination allows the agent to bridge the gap between research and action. For example, it can research three competitors on the open web, pull your internal sales data from a connected Google Sheet, and then use the Terminal to generate a formatted PowerPoint file summarizing the findings.
Performance Benchmarks
OpenAI is positioning this as their most capable reasoning model to date, specifically optimized for long-horizon tasks. The model behind Agent Mode reportedly scores 41.6% on Humanity’s Last Exam (pass@1), which is roughly double the performance of the o3 and o4-mini models on the same difficult cross-subject test Source: TechCrunch.
Other notable scores include:
- 68.9% on BrowseComp: A benchmark for complex web navigation.
- 27.4% on FrontierMath: Testing high-level mathematical reasoning.
- SpreadsheetBench: Outperforming Copilot in Excel by more than 2x in automated data manipulation.
Pricing and Usage Limits
While the feature is included in existing subscriptions, it is not “unlimited.” OpenAI has implemented a monthly quota system based on “agent requests.” Notably, only the initial prompt that starts a task counts toward your limit; the dozens of intermediate steps the agent takes to complete the job are free.
| Plan | Monthly Agent Limit | Price |
|---|---|---|
| Plus | 40 messages | $20/mo |
| Pro | 400 messages | $200/mo |
| Team | 40 messages / user | $25-30/user/mo |
| Enterprise | 40 messages / user | Custom |
Tasks typically take between 5 and 30 minutes to complete. Users can monitor the agent’s progress in real-time, interrupt it to provide new instructions, or set tasks to run on a recurring schedule (daily, weekly, or monthly) via the new Schedules dashboard.
Competitive Landscape: OpenAI vs. Anthropic vs. Google
The “Agent Wars” are now centered on how these models interact with the OS.
- Anthropic (Claude Computer Use): Focuses on a developer-first approach, providing an API that allows engineers to run the agent in their own Docker containers. It is highly flexible but requires more setup.
- Google (Project Jarvis/Mariner): Deeply integrated into the Chrome browser and Google Workspace. It excels at tasks within the Google ecosystem but is more restricted to the browser DOM.
- OpenAI (Agent Mode): A vertically integrated consumer product. It provides the virtual machine, the browser, and the terminal out-of-the-box, making it the most accessible for non-technical power users.
Community Sentiment and Risks
Early feedback from practitioners on Reddit and Hacker News suggests a mix of awe and pragmatism. While the “virtual computer” capability is praised for handling tedious research, many users note that the agent can still suffer from “context rot” during very long sessions, occasionally losing the thread of complex instructions.
Security remains the primary concern. OpenAI has introduced “Watch Mode” for high-stakes actions like sending emails or making payments, requiring a manual user click before the agent can proceed. There is also a single-toggle privacy setting to wipe all browsing data from the agent’s session once a task is complete.
Takeaways for Builders
- Offload the “Boring” Research: Agent Mode is best used for tasks that require visiting 10+ websites to aggregate data into a single document.
- Mind the Quota: With only 40 uses on the Plus plan, save Agent Mode for multi-step workflows rather than simple queries that GPT-4o can handle.
- Monitor the Loop: Because the agent can pause for clarification, it’s more of a “centaur” workflow than a “set and forget” tool for now.
- Enterprise Readiness: The inclusion of a Terminal and GitHub connectors suggests OpenAI is moving aggressively into the DevOps and Data Science automation space.