Microsoft MAI: 7 New AI Models, the OpenAI Break, Explained

Picture a contact centre in Parramatta on a Tuesday morning. A caller is mid-sentence in Greek, an agent is typing notes in English, and somewhere in the background a transcription model is doing the heavy lifting. As of last week, that model is one Microsoft built and trained itself, on the company's own silicon.

On 2 June 2026, at Build, Microsoft's AI chief Mustafa Suleyman rolled out not one but seven in-house models under the new MAI banner [1]. It was the most concrete answer yet to a question the industry has been quietly asking for two years: what happens to Microsoft when the company that made the models it depends on becomes its biggest competitor?

This is the plain-English version of what was announced, why the timing matters, and what it could mean for an Australian business running on Copilot, an enterprise team on Azure, or a curious reader wondering whether the AI tools in their browser are about to get cheaper, faster, and a bit less American.

The seven models, in plain English

Suleyman called the collection a family, and like most families the members are doing quite different jobs. There are three to know about up front, and a fourth that matters most to anyone who writes code.

MAI-Image-2.5 and its lighter Flash sibling climbed to number two on the public image leaderboard at launch, beating a model called Nano Banana 2 on image editing [1]. The interesting part is not the leaderboard. It is that the model is already in PowerPoint and OneDrive, which means the next time a Sydney marketing team asks Copilot to "make a slide that does not look like a stock photo", the picture is coming from a Microsoft model, not OpenAI.

MAI-Transcribe-1.5 is, by Microsoft's own claim, the best transcription model available across 43 languages, and is up to five times faster than the closest competitor [1]. It plugs into Copilot, Teams, GitHub and the Dynamics 365 Contact Centre, which is the kind of plumbing that tends to be invisible until you realise your call recording just got a lot more accurate in Mandarin.

MAI-Voice-2 comes in 15 languages, with a Flash variant aimed at ultra-low-latency voice agents [1]. MAI-Thinking-1 is a 35-billion-active-parameter mixture-of-experts model with a 256,000-token context window. On the AIME 25 maths benchmark it scored 97%, and on SWE Bench Pro, a software-engineering test, it scored 53% and was preferred over Anthropic's Sonnet 4.6 in head-to-head comparisons run on Surge [1].

The one with the loudest implications for developers is MAI-Code-1-Flash, a five-billion-parameter coding model scoring 51% on SWE Bench Pro, shipping as the default in VS Code, and co-designed with Microsoft's Maia 200 chip, which added another 1.4x performance-per-watt gain on top of the 30% improvement Satya Nadella had already announced [1]. There are two more members of the family rounding out the seven, all of them sharing what Microsoft describes as a "clean, commercially licenced data lineage" with no distillation from third-party models [2]. In a market where training data provenance is now a legal exposure, that single sentence is doing a lot of work.

The quiet break from OpenAI

For the past three years, every product roadmap slide at Microsoft has, somewhere near the bottom, contained the words "powered by OpenAI". That dependency was the backdrop for everything Copilot, Bing and Azure OpenAI Service were built on.

Suleyman's keynote did not say "we are leaving OpenAI". It did not need to. By launching a full in-house family, and by making three of those models available on third-party inference platforms like OpenRouter, Fireworks and Baseten, Microsoft has done something more useful for a journalist: it has moved from one supplier to two [2].

That is the kind of sentence that sounds boring and is, in fact, the entire story. A company that depends on a single supplier for its most strategic input is in a weak negotiating position. A company with its own production line, even at smaller scale, is not. The McKinsey example Suleyman cited in his keynote makes the point in dollars: a McKinsey-tuned MAI model beat GPT-5.5 on win rate at ten times lower cost [1]. How long before that lands in a quote to your finance team?

There is also a sovereignty angle for anyone outside the United States. Models that Microsoft owns outright, runs on its own chips, and licences with clean data lineage can be sold into European, Japanese and Australian regulated industries that are increasingly nervous about where their training data came from. That is not altruism. It is a market expansion play dressed up as a values statement.

Tuning the weights, not just the prompt

Most enterprise AI use today looks like this. A company takes a foundation model, writes a clever prompt, sometimes tacks on a retrieval system that pulls facts from its own documents, and hopes for the best. The model itself never changes.

Microsoft's release is the first in which developers can tune the model weights themselves, not just write prompts or use retrieval-augmented generation [2]. That sounds like a small technical shift. It is not. Tuning the weights means the model can learn the rhythm of a particular company's workflows, the way it phrases things, the kinds of mistakes it should avoid. A law firm can produce a model that drafts in its house style. A logistics company can produce a model that already knows the lading codes. A contact centre can produce a transcription model that has heard its own agents' accents a thousand times.

The McKinsey case is the proof point. A McKinsey-tuned MAI model beat OpenAI's flagship GPT-5.5 on win rate, and at a tenth of the cost [1]. The internal Excel-tuning example Suleyman used lands the same way: a model trained on the company's spreadsheet patterns performed comparably to GPT-5.4 at up to ten times lower cost [1]. The cost gap is not a marketing number. It is the reason these models will end up in places the more expensive ones could never reach.

The catch, for now, is that weight tuning is not a button a non-technical buyer can press. It is closer to running a small training job, with all the data preparation, evaluation, and governance that implies. Expect the big consultancies to have a field year.

What changes for the rest of us

If you have ever used Copilot in Word, Teams or Outlook, the honest answer is: not much, and quite a lot, depending on the day.

In the next twelve months, expect the image and voice features inside Microsoft 365 to feel noticeably sharper. The PowerPoint image generation is already MAI-Image-2.5, and the OneDrive photo features are next. MAI-Voice-2 is positioned for voice agents and is available in 15 languages [1]. If you dictate into a phone or a Teams call, the words will land faster and in more languages.

In Azure, the change is more structural. Microsoft Foundry, the platform that lets developers pick and combine models, now has Microsoft's own family sitting next to OpenAI, Anthropic and the open-source players, and three of the MAI models are also on OpenRouter, Fireworks and Baseten [2]. A buyer can run the same workload on multiple vendors without rewriting the application, which is the technical precondition for any real price competition.

For Australian businesses specifically, the price-to-performance ratio is the headline. The same McKinsey-style tuning exercise that works for an American consultancy works for a Sydney one, with the same ten-times-cost saving in the right conditions.

There are also competitive effects to watch. OpenAI still produces the frontier models, and Microsoft still resells them. The interesting question for the rest of 2026 and into 2027 is whether OpenAI's commercial terms get more generous, or less, now that Microsoft has a credible alternative to bring into a negotiation. The Build keynote also put Nadella and Nvidia's Jensen Huang on stage together to underline the Microsoft-Nvidia hardware-software co-design [3], a reminder that the AI race is also a silicon race, and that Microsoft's plan to own both layers is further along than most observers had realised.

The honest caveats

A few things the keynote did not say are worth saying here.

First, the benchmarks Microsoft cited are internal or third-party-leaderboard numbers, and benchmark performance is not the same as real-world performance on your data. Treat the 97% on AIME 25 and the 53% on SWE Bench Pro as evidence the models are strong, not as a guarantee they will solve your specific problem.

Second, the cost claims are at inference time, and they assume weight tuning has already happened. The upfront cost of preparing a clean dataset, hiring the people who can run the tuning, and setting up evaluation is non-trivial, and it falls on the buyer.

Third, the seven models cover a specific set of tasks: image, voice, transcription, thinking, and coding. There is no general-purpose flagship chat model in this release. For the everyday chat-style Copilot experience, OpenAI is still in the building.

And fourth, the timing. Microsoft launched these models at its own developer conference, in front of an audience that wanted to be impressed. Read the keynote transcript, read the technical blog, and wait for independent benchmarks before signing a multi-year commitment.

Microsoft MAI: What 7 New In-House AI Models Mean for the Rest of Us