OpenAI and Anthropic Ship Frontier Models Hours Apart — Both Claim SOTA, Both Let Agents Build Software Autonomously
GPT-5.3-Codex and Claude Opus 4.6 dropped on the same day with dueling benchmark crowns. But the real story is what happened when each lab handed its model an open-ended engineering task and walked away.
In a remarkable collision of release schedules, OpenAI and Anthropic both shipped major new models on Thursday — GPT-5.3-Codex and Claude Opus 4.6 — each claiming state-of-the-art results on different benchmarks and each accompanied by demonstrations of autonomous software engineering that would have been science fiction two years ago.
OpenAI's GPT-5.3-Codex arrives with a 57% score on SWE-Bench Pro and 76% on Terminal-Bench, as announced by @OpenAIDevs. According to @thsottiaux, the model is 60–70% faster than its predecessor GPT-5.2-Codex while simultaneously pushing performance higher — a combination OpenAI has not previously achieved. The model is also described as OpenAI's "strongest computer-use model yet," per a separate @OpenAIDevs post, suggesting it can navigate GUIs, terminals, and browser-based workflows with improved reliability.
Get our free daily newsletter
Get this article free — plus the lead story every day — delivered to your inbox.
Want every article and the full archive? Upgrade anytime.
No spam. Unsubscribe anytime.