OpenAI o3 & o4-Mini, Gemini Live and Anthropic Research: What the new kids on the AI block promise

ByVishal Mathur
17 Apr

These aren’t just new models, with expected claims to be better than anything that preceded them. “The smartest models we have released to date,” a pitch that leaves little to ambiguity. They have a point, since the o3 and o4-Mini can handle everything from coding, maths, to visual perception. Instead, there is a definitive approach towards building a wholesome ecosystem centered around utility and intuitiveness. The Codex CLI lightweight coding agent, which draws on the o3 and o4-Mini’s coding abilities, is an example of that intuitiveness.

Sam Altman, CEO of OpenAI, explains an approach towards building reasoning models that can access and use every tool that is available within ChatGPT, depending on query. This includes web search, Python (this is a capable, general purpose programming language), image analysis, image generation as well as interpreting files a user shares. “The ability of the new models to effectively use tools together has somehow really surprised me. Intellectually I knew this was going to happen but it hits different to see it,” wrote Altman, in a post on X. In almost all benchmark results which OpenAI has shared, the o3 and o4-Mini are scoring higher than predecessor reasoning models, the o1 and o3-mini.

Response from the industry has been positive, but the competitive landscape pits these models against very capable rivals.

“The o3 launching now has scored over 87.5% on ARC-AGI. Human performance is at 85%,” says Yana Welinder, CEO of Kraftful, a company that builds copilots for businesses and teams. ARC-AGI, which Welinder references, is a benchmark that assesses how efficiently an AI can learn and generalise from minimal information, reflecting a fundamental characteristic of human intelligence.

Also Read:Is OpenAI developing its own X-like social media network amid Elon Musk and Sam Altman feud?

Bindu Reddy, CEO of Abacus AI, a company that makes a ‘super assistant’, believes the o4-Mini may be “the real story” owing to better benchmark results than Google Gemini 2.5 and lower costs for developers, but warns that the “o3 is pretty smart but is dangerously expensive”. In a post on X, she writes, “GPT 4.1 may have been OpenAI’s biggest win this week.”

The o3 and o4-Mini are reasoning models, which are trained for structured thinking, problem solving and handling multi-step queries. Generative AI models, which most consumers would have used with regularity, are primed for content generation, conversation and simpler searches or queries. The fact these models are trained to reason, allows for a more ‘agentic’ ChatGPT; and means this is the closest a consumer-facing AI product has come to AI agents that enterprises are increasingly deploying.

xAI too is adding a canvas-like feature called Studio to Grok, for creating and editing documents as well as basic applications. Grok 3, released earlier this year and a significant improvement over its predecessors, can now generate documents, code, reports, and browser games,” the company says. For now, Grok Studio is available for free and paid subscribers.

They aren’t the only ones to create a canvas-esque workspace for writing projects and tinkering with code. OpenAI had added Canvas to ChatGPT late last year, following Anthropic’s Claude’s coding smarts.

“Ask Claude to pull together meeting notes from last week, identify action items from follow-up email threads, and search relevant documents for additional context. Claude brings these insights directly to you, eliminating hours of manual work,” says Anthropic. As part of the Research envelope, Claude operates agentic-ally. That means conducting multiple searches which build on each other while determining exactly what to investigate next.

Anthropic’s models underline visual communication suite Canva’s new Code capabilities too. “We very much build our own models, but these models leverage some of the world’s best open-source models to essentially give it context and information. When it comes to Canva Code, this is in partnership with Anthropic, something we’ve been very excited about,” Cliff Obrecht, co-founder and chief operating officer of Canva, tells HT.

Claude, as part of the broadening Research capabilities, is finding deeper integration within Google’s popular Workspace apps — Gmail, Calendar, and Docs. The idea is, to bring together information from a user’s work, and the web. “Claude understands your context and can pull information from exactly where you need it,” the company says, in a statement.

Google’s Gemini Live, a generative AI app for smartphones that gains context from a user’s immediate surroundings, including viewing the world through the phone’s camera, is also adding the screen sharing option. It will be available for free, which means users don’t need to pay for the 1,950 per month Gemini Advanced subscription, and will be rolling out to all Android phones in the coming weeks.

“Gemini will provide real time feedback based on the new skill you’re learning or task you’re completing. You can interrupt Gemini at any point, pause or stop sharing, and dynamically switch between sharing your front camera, rear camera, or screen,” Google explains, in a statement.

In the world of Windows PC’s, Microsoft’s adding Copilot Vision to the Edge web browser. Mustafa Suleyman, CEO of Microsoft AI says, “It can literally see what you see on screen. It’ll think out loud with you when you’re browsing online. No more over-explaining, copy-pasting, or struggling to put something into words.” Microsoft has kept this as an ‘opt-in’ for now, and a broader feature set requires the Copilot Pro subscription. That means parting with 2,000 per month.

At we at, or near a genius level, with AI? The answer may be more difficult than imagined.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10