This week looked, at first glance, like another week of model stories.
DeepMind published a paper about the path from AGI to ASI. Anthropic lost access to its newest models after a government directive. Anthropic also published a large Claude Code usage study. Microsoft and Google pushed enterprise agent platforms deeper into production workflows. Sensor Tower and Ramp showed the assistant market splitting across consumer reach, business spend, and monetization.
Those are all model-adjacent stories. But the pattern underneath is not “the models got smarter.” The more useful read is that the control surface moved.
The important questions are now: who can access the model, what systems it can touch, how its work is reviewed, what context it is allowed to use, how its costs are metered, and where users encounter it. The model still matters. But the thing around the model is becoming the product.
DeepMind’s From AGI to ASI paper is a clean starting point because it is not mainly a launch announcement. It is a map of possible transitions after human-level AGI. The paper names four pathways from AGI to artificial superintelligence: scaling, paradigm shifts, recursive improvement, and emergence from large-scale multi-agent collectives. That last category is the tell. The problem is not just whether one model crosses one line. It is what happens when many capable systems interact, specialize, coordinate, fail, and change the pace of work around them.
The same week, Google DeepMind and partners announced up to $10 million in multi-agent safety funding for work on sandboxes, agent networks, agent infrastructure, and oversight. The call asks for ways to study virtual marketplaces, simulated ecosystems, multi-organization workflows, identity, reputation, commitment protocols, and monitoring of deployed agent populations. That is not a model-card problem. It is a systems problem.
A single model can be evaluated in isolation. A population of agents needs a world around it: testbeds, identities, permissions, incentives, audits, and ways to stop or contain failure. The safety surface moves from “is this answer allowed?” to “what behavior emerges when many answer-making systems are embedded in workflows?”
The Fable/Mythos story made the access layer impossible to ignore. Anthropic says the US government issued an export-control directive requiring suspension of access to Fable 5 and Mythos 5 by foreign nationals, including foreign-national Anthropic employees. Anthropic says the practical effect was that it had to disable both models for all customers to ensure compliance. The government letter itself has not been made public in the sources I read, and Anthropic says the letter did not give specific details of the national security concern. Anthropic’s stated understanding is that the concern involved a narrow, non-universal jailbreak connected to cybersecurity work.
That is a dispute over technical facts, legal authority, and deployment process. But for customers, the immediate lesson is simpler: model access can disappear for reasons outside the API call. The White House executive order from earlier in June set up a voluntary framework for “covered frontier models,” a classified benchmarking process, government access before release, trusted partners, and an AI cybersecurity clearinghouse. Anthropic’s statement says this later action did not fit the transparent, fair, technical process it thinks should exist. Both sides of that dispute point to the same new reality. Frontier model access is becoming a governance and reliability layer.
That changes enterprise planning. A company can no longer treat the model as a stable utility with only price, latency, and benchmark risk. It has to ask who controls access, which employees or jurisdictions are covered, whether a model can be recalled, how quickly a fallback can be substituted, and what work stops if access changes overnight.
The coding-agent story made the review layer just as visible. Anthropic’s Claude Code study analyzed about 400,000 interactive sessions from about 235,000 people between October 2025 and April 2026. In the typical session, Anthropic says people make about 70% of the planning decisions while Claude makes about 80% of the execution decisions. The user decides what to build. The agent decides much of how to build it.
That division is easy to misread. It does not mean expertise goes away. Anthropic’s data points the other direction. The more task-specific expertise a user brings, the more work Claude does per instruction and the more likely the session is to succeed. Sessions rated intermediate or higher reach verified success roughly 28% to 33% of the time, compared with 15% for novice-rated sessions. When sessions hit trouble, expert-rated sessions recover more often. The bottleneck shifts from syntax to steering.
That is why the review question is changing. “Did the tests pass?” is still necessary. But it is no longer enough. A coding agent can make a shallow abstraction feel shippable faster. The better question is whether the diff improves the next operator’s control surface: clearer seams, fewer surprise dependencies, cheaper rollback, and an easier next change.
DeepMind’s AI Control Roadmap made the same point from the security side. DeepMind says its internal agent-security framework goes beyond model alignment by adding system-level security: sandboxing, endpoint security, prompt-injection resistance, monitoring, prevention, and response. The post says DeepMind treats internal agents as potential insider threats and maps defenses to model capabilities, including the ability to evade detection and the ability to cause harm. It also says DeepMind has analyzed a million coding-agent trajectories to inform monitoring.
This is what mature agent deployment looks like. Not “trust the model.” Not “ban the model.” Controlled access. Measured permissions. Supervisors. Logs. Escalation. Human review at the points where the cost of a wrong action is high.
Microsoft and Google turned the same pattern into enterprise product strategy.
Microsoft says Copilot Cowork is generally available worldwide for Microsoft 365 Copilot customers and is designed for complex, long-running, multi-tool tasks that return a completed result rather than a draft. It runs inside the Microsoft 365 trust boundary, uses Work IQ for business context, supports model choice, and is billed by usage. The price of a Cowork task is calculated from model use, context retrieval, tool calls, and runtime. Admins get spending limits, budgets, alerts, and reporting.
Work IQ is the more important object. Microsoft describes it as an intelligence layer that continuously processes email, calendar, meetings, chats, files, people, collaboration patterns, and line-of-business systems into a semantic understanding of how work gets done. Its APIs expose chat, context, tools, and workspaces. The workspace part matters: long-running agents need a place to keep files, memory, progress, and intermediate outputs inside the tenant boundary.
That is the enterprise agent as a governed work loop. Context comes from the organization. Tools are shaped for agents. State lives in a workspace. Billing attaches to the run. Compliance machinery wraps the artifacts. A human may still make the important call, but the system around that call is now productized.
Google’s HSBC announcement rhymes. Google Cloud and HSBC announced a multi-year partnership using Gemini models, Gemini Enterprise Agent Platform, Google Cloud, and Google DeepMind engineering. The announced focus areas are wealth-management support, financial-crime risk management, and frontline or relationship-manager service. The release says the partnership is expected to enable more than 200 AI use cases over two years, while HSBC prioritizes initiatives it estimates could return more than $100 million in direct revenue or efficiency improvements. Google’s London Summit post frames the broader shift as moving from chatbots and media experiments to production execution.
Put Microsoft and Google together and the story is not “which chatbot is better?” It is who owns the enterprise context plane, permission plane, tool plane, workspace plane, approval flow, compliance layer, and meter. Models can be swapped behind that surface. The system of action is stickier.
The assistant-market data showed the same split on the consumer and business sides. Sensor Tower says ChatGPT is still the category-defining assistant and reached one billion monthly active mobile users in May 2026. It also says ChatGPT’s True Audience share fell below 50% for the first time in March as Gemini and Claude gained traction, and that AI assistants are becoming a discovery and advertising layer. TechCrunch, citing Sensor Tower, reported ChatGPT at 46.4% share by the end of May, Gemini at 27.7%, Claude at 10.3%, and an average of 17% of daily ChatGPT users being served ads by May.
Ramp’s business-spend data tells a different scoreboard. Ramp says its AI Index now focuses less on whether companies use AI and more on adoption intensity. In its June update, Anthropic rose to 41% of businesses on Ramp while OpenAI was at 39.5%. Ramp also says the top 1% of AI-spending firms spend about $7,450 per employee per month, while the median firm spends $11.38, and that advanced users use multiple AI vendors.
Consumer reach, business adoption, spend intensity, model switching, and ads are now separate races. A company can lead one and trail another. ChatGPT can still be the mass-market habit while Anthropic shows up strongly in one business-spend dataset and Gemini gains from distribution through Google’s ecosystem. The market is no longer one leaderboard.
The through-line is not that models stopped mattering. They matter enormously. None of these surfaces work if the underlying systems are weak. But the week’s strongest signal was that model quality is becoming one input into a larger operating system.
In security, the surface is access control and monitoring. In coding, it is specification and review. In enterprise software, it is context, tools, state, approvals, compliance, and billing. In consumer assistants, it is distribution, switching, subscriptions, and ads. In policy, it is who gets access, under what process, and with what fallback when access changes.
That means the right question for any AI story is changing. “How smart is the model?” is still worth asking, but it is no longer enough. Ask what the model can touch. Ask who can stop it. Ask where the state lives. Ask how the work is reviewed. Ask who pays when the run gets long. Ask whether the user is choosing a model, a workflow, a platform, or a distribution channel.
This is where the frontier moved this week: from answers to surfaces of control. The winners will not only be the labs with the best next model. They will be the companies, institutions, and users that make capable systems governable enough to trust, flexible enough to swap, and legible enough to correct.
Source graph: Where the control surface moved — Sources