OpenAI Unveils o3 and o4-mini: A New Era of AI Reasoning, Coding, and Multimodal Intelligence

3 minute read

Published:

OpenAI has officially launched its new o3 and o4-mini models, marking a major leap forward in AI reasoning, tool use, and software engineering. These models go beyond text generation, introducing agent-like capabilities and advanced multimodal reasoning—paving the way for the next generation of AI-powered productivity tools.

🚀 What Makes o3 and o4-mini Different?

“These are the first models where top scientists tell us they produce legitimately good and useful novel ideas.” — Greg Brockman, OpenAI

✅ AI as a Problem Solver

OpenAI’s o3 isn’t just a model—it’s an AI system. It can chain hundreds of tool calls to solve complex tasks, interact with live codebases, and use real-world data from the web. In fact, OpenAI reports that o3 High was used in assisting with proving a new theorem in condensed matter physics.


🔧 Tool Use: Not Just Talk

o3 and o4-mini are trained to use tools in their chain of thought. These tools include:

  • Python code execution for logic, math, and data visualization.
  • Web browsing to retrieve up-to-date information.
  • Image manipulation (e.g., cropping, transforming, interpreting visuals).
  • Memory and context awareness, especially in ChatGPT.

💡 Real-World Demos

🧪 Science + Reasoning

In one demo, a researcher asked GPT-4 Mini to finish an old physics project by analyzing a poster, extracting and extrapolating graph data, and comparing it with modern literature. The model:

  • Identified that the result wasn’t present.
  • Analyzed the graph.
  • Pulled external research from the web.
  • Provided a close estimate, similar to a peer-reviewed paper.

🐠 Music Meets Marine Biology

Another demo blended scuba diving and music—two personal interests of the user. GPT-4 Mini discovered research on how playing audio recordings of healthy coral reefs underwater can accelerate coral regrowth. It:

  • Retrieved research.
  • Explained the process.
  • Generated a blog post with citations and data visualizations.

🧠 Autonomous Coding: SweetBench + Real Projects

Using the container tool and o3 High, OpenAI demonstrated how the model:

  • Debugs a Python symbolic math package (Senpai).
  • Navigates the repo using ls, cat, grep, and Python’s MRO system.
  • Patches the bug and verifies the fix via unit tests.

📊 On SweetBench (coding benchmark):

  • 22 interactions
  • 16,000 tokens
  • 100+ container steps in some tasks

📷 Multimodal Reasoning — A Major Leap

o3 and o4-mini can now:

  • Interpret complex or low-quality images.
  • Perform image analysis + Python reasoning in one step.
  • Manipulate visuals directly in the chain of thought.

This unlocks serious potential for technical diagrams, charts, UI mocks, and scientific illustrations.


📉 Benchmark Results

Math & Reasoning

  • o4-mini hits 99% accuracy on AM (advanced math) tasks.
  • o3 High reaches 83%+ on GPQA (PhD-level science).
  • Codeforces rating surpasses 2700 (top 200 competitive coders).

Multimodal Tasks

Performance spikes across:

  • MMU
  • MathVista
  • ChartQA
  • V-Star

Inference Efficiency

  • o4-mini offers better performance per dollar than o3.
  • o4-mini is now multimodal by default.

💻 Introducing Codex CLI

The surprise reveal? Codex CLI—a command-line tool that lets o3/o4-mini interact directly with your computer.

  • Runs in safe suggest mode or full auto mode (sandboxed).
  • Can read files, edit code, generate apps, and run functions.
  • Fully open-source at: GitHub - OpenAI/Codex

💸 OpenAI is also launching a $1M open-source fund to support developers building with Codex CLI.


📦 Availability

Starting now:

  • Pro, Plus, and Team users get o3, o4-mini, and o4-mini High.
  • Enterprise and Edu rollout starts next week.
  • API support for tool use will launch in the coming weeks.
  • o3 Pro coming soon to replace o1 Pro.

🧬 Final Thoughts

OpenAI’s new o3 and o4-mini models are not just smaller siblings—they represent a shift in how models reason, act, and assist.

With:

  • Tool integration
  • Scientific aptitude
  • Multimodal understanding
  • Developer tooling (Codex CLI)

…these models are stepping closer to true AI agents that help us think, build, and solve.

🎉 The future of AI isn’t just about language. It’s about action.


Try it out today and let us know what you build.