Microsoft 365 ships agent inventory not observability
The new Microsoft 365 Admin Center (MAC) agent registry inventories Copilot agents the way an IT asset database inventories laptops. It tells you the laptop exists, who owns it, and what software is installed. It does not tell you what the laptop did at 3am on Tuesday. Every governance deck for Copilot Studio in 2026 talks about policy. Almost none of them talk about the row count in the agent inventory table, which is the thing your auditor will actually point at. MAC now ships that table, an approval queue, a billing surface tied to Azure, and a set of three levers that decide who can create, consume, and share agents inside your tenant. That part of the story is real and unusually well executed.
For a tenant with one or two Copilot agents, the observability gap is acceptable. For a fleet, it is the entire job.
The cliff is at observability. The people running real agent fleets, not the people demoing them at Ignite, are spending their nights at the bottom of it.
In this article
- What Microsoft actually shipped
- Why this works for the obvious problems
- Where the model breaks
- The lifecycle gap most posts miss
- What to do Monday morning if you run an agent fleet
- Open questions
Microsoft 365 ships agent inventory not observability
What Microsoft actually shipped
The governance surface in MAC is three levers, one registry, one billing plane, one approval queue, and a $15/user/month SKU. Every piece lands somewhere concrete in the UI today. The Microsoft Power Platform community webinar of 2026-04-22, "Govern your Copilot agents in the new Admin Center" (recording), walks three product managers through the surface live: one on agent risk, one on MAC lifecycle, one on cost management. They are not promising a vision; they demo it. The line item that frames the rest of the economics is upfront: the Microsoft Security Blog confirms Agent 365 is $15 per user per month standalone, or bundled with E7. Tenants on E3 without the Copilot stack get a thinner surface. Any business case starts here, not at the registry tour.
Lever one is who. Agent extensibility is a single control on Microsoft 365 Admin Center > Copilot > Settings that governs creation, consumption, and sharing rights together. Default is all Copilot-licensed users. You can scope down to security groups or named individuals before any agent gets built. Lever two is what. You can restrict allowable agent types to any combination of Microsoft-certified public agents, multi-tenant ISV agents, and internal line-of-business agents. Lever three is how far. Users not on the approved sharing list see the "Share with anyone in the org" option grayed out; they keep "some users" and "only me" as their only sharing surfaces. That greyed checkbox is the whole admin queue's reason to exist.
The registry itself lives at MAC > Agents > All Agents. Microsoft's own learn docs confirm four agent classes (Microsoft-built, external partner, published by your org, shared by creator) and three counters that hit your eye on load: total agents, agents without owners, and unmanaged agents. The unmanaged-agent count is the most interesting one. It is a first-class signal that something inside your tenant is talking to your data without being inside Agent 365's protection envelope at all. Click any row and you see the agent's description, knowledge sources, Entra-granted permissions, and the APIs it calls.
The billing plane is Azure, not a separate metering service. You create a billing policy by linking an Azure subscription and resource group (you need Owner or Contributor on both), bind it to user groups, set an optional cap, and choose budget-alert thresholds at, say, 50% of the cap. The economic logic baked into the UI is sharper than it looks: 2,000 Copilot credits in 30 days approximates the $30 unlimited Copilot license. When a user crosses that threshold, the admin gets a recommendation to convert them from variable PAYG to a fixed license. Pin limit is three agents per user. Prepaid capacity packs (25,000 credits each) drain first, then PAYG picks up, unless you explicitly disable the fallback. Per-tenant cap on credit policies is 10. All numeric limits in this section are from Microsoft Learn unless otherwise noted.
The approval queue closes the loop. Makers in Copilot Studio submit a "publish to tenant" request instead of sharing directly. Admins review pending requests in MAC: agent function, data and tools, security posture, required Entra permissions. On approval, the admin picks the recipient users or groups and can pre-install the agent so it appears in users' Copilot left nav without manual acquisition. Microsoft's Copilot Studio April 2026 update made this the documented default, not maker-direct sharing.
That is what shipped. Read closely and you will notice the surface is internally consistent: identity, lifecycle, distribution, and billing are bound to the same tenant, the same Entra principals, and the same Azure subscription. Third-party tools cannot replicate that binding because they do not own the runtime.
Why this works for the obvious problems
The Entra-grounded permission model kills the broad-sharing data-leak panic. A shared agent cannot reach what its caller cannot reach. The agent runs with the caller's permissions, plus whatever extra Entra explicitly granted to the agent identity. This is the most important governance fact in the surface, and it is the one most often misread on the way in.
💡 Agents inherit the caller's Entra permissions plus whatever the agent identity was explicitly granted. Broad sharing stops being a data leak the moment your ACLs are correct, and starts being one the moment they aren't.
Entra Agent ID went GA on 2026-05-01. It replaces the single-app-registration pattern with a three-tier hierarchy: Agent Blueprint as the template, Blueprint Principal as the per-tenant identity, Agent Instance as the runtime. Copilot Studio auto-creates an Entra agent identity per new agent when the environment feature is enabled. Permissions on that identity are immutable once provisioned. Petri's coverage of identity as the new control plane reads it correctly: identity moved from a security layer to the control plane for agents. The MAC registry is downstream of that move.
The approval queue closes the second risk: tenant-wide publication without review. The default world without it, per Microsoft's own (vendor-self-reported) security blog: 80% of Fortune 500 run active AI agents built with low-code tools, and 29% of employees have used unsanctioned agents for work. Treat both numbers as Microsoft-attributed direction-of-travel, not independently verified. "Share with anyone in the org" as a default is a one-click data spill regardless of the exact prevalence. Routing publication through an admin queue, with Entra permission inspection at review time, is the smallest intervention that converts a continuous risk into a discrete one.
Billing closes the third: uncapped consumption. Capacity packs as a prepaid buffer, PAYG as a metered fallback, budget alerts at 50% of cap. The April 2026 roadmap (agent-level billing restrictions, one-click license-flip, and standalone capacity-pack policies with hard cutoffs and no PAYG fallback) extends the surface from per-tenant to per-agent and from soft caps to hard caps.
For a tenant going from zero Copilot agents to twenty, this surface is enough. The control plane is identity-bound, the cost plane is metered, the publication plane is gated. You can pass an audit with what is in MAC today.
Where the model breaks
The registry tells you an agent exists. It does not tell you what the agent did. Once you cross from "we have agents" to "agents are doing work that matters," the visibility ceiling drops fast, and almost none of the public material is honest about where it sits.
Start with the observability gap. The risk taxonomy in MAC is concrete (shadow agent, no owner, excessive permissions, security misconfiguration, prompt injection, sensitive data access, conditional access violation, pending approval, operational exceptions, compliance gap) and it fuses signals from Defender, Entra, and Purview. Latency on the Risk column is up to one hour, and only high-severity events roll up. Blocked or filtered interactions surface as generic ContentFiltered telemetry. The admin sees that something was blocked, not why, not what was attempted, not what data nearly leaked. There is no per-request prompt log, no per-decision diff, no reasoning trace. Gartner's Dennis Xu half-jokes about restricting Copilot on Friday afternoons because users will not validate outputs at week's end. That joke only lands because the system itself cannot validate them either.
The Entra-permission grounding is correct and load-bearing, and it is also limited by what Entra knows. If your SharePoint estate is already over-shared (2toLead's 2026 governance roundup puts that at 15% or more of business-critical files), the agent will faithfully respect a permission set that is itself the problem. The registry visualizes the agents. It does nothing to the underlying ACLs that are the actual leak source. The IBM Cost of Data Breach 2025 figure (an extra $670,000 per breach in environments with high shadow AI) is what the broken-ACL world looks like with agents pointed at it.
The next break is shadow agents the registry cannot see. This is not the same as the Microsoft "29% unsanctioned agents" stat — that number covers any unsanctioned use, including ChatGPT-in-a-browser for a work task, most of which never becomes a persistent agent. The narrower category is what the registry can structurally never enumerate: anything built on a non-Agent-365 surface, anything wired through a generic Azure OpenAI endpoint with a service principal, anything routed via API key to an external LLM. The two populations overlap; they are not the same. The MAC unmanaged-agent counter measures the subset Microsoft can detect inside its own runtime. Everything outside that runtime is invisible by category, not by oversight.
Stacked against pure AI gateways the picture inverts. Cloudflare AI Gateway, Portkey, and LiteLLM operate at the request level. They log prompts, route by model, redact PII, detect jailbreaks, enforce per-key budgets, and produce audit trails of what was sent and what came back. None of them know what a SharePoint permission is. The trade is symmetric and worth naming:
| Capability | MAC / Agent 365 | AI Gateway (Portkey, Cloudflare, LiteLLM) | AWS Bedrock Guardrails |
|---|---|---|---|
| Tenant-wide agent inventory | Yes, four classes, shadow detection | No, per-app proxy only | No |
| Per-request prompt/response log | No | Yes | Partial, with audit |
| Entra-grounded user-permission inheritance | Yes | No | No |
| Content moderation / jailbreak detection | Weak (ContentFiltered) | Strong | Strong (AWS-reported 88% block rate) |
| Cost caps with admin alerts | Yes, 50% threshold, per-policy budgets | Per-key budgets | Yes |
| Lifecycle (owner reassignment, soft delete) | Partial, soft delete still roadmap | No | No |
Microsoft owns the identity-and-lifecycle column. Gateways own the per-request-forensics column. Bedrock owns the content-safety column. The serious agent operator runs all three and treats them as orthogonal.
The lifecycle gap most posts miss
Hard delete in MAC removes the agent and its files. It does not unwind anything the agent wrote outside the tenant. This is the governance break enterprise audit teams actually ask about, and it sits one layer below where most launch posts stop.
An agent that wrote tickets to Jira, rows to an external CRM, or files into its own data container leaves all of that behind when the registry deletes the agent identity. The next quarter's audit is reconstructed against a runtime that no longer exists. The admin can show the agent was deprovisioned; they cannot show what it did before deprovisioning, or which downstream rows now have an orphaned author. Cross-tenant federation makes it worse: an ISV-built agent reaching from a partner tenant into yours leaves side effects in your systems, audit trail in theirs, and lifecycle ownership ambiguous between the two. Soft delete is on the roadmap. Side-effect rollback is not, and structurally probably cannot be — the agent has no transactional view of every external system it touched.
The pragmatic move is to assume every agent that writes externally is a small ETL job in disguise. Log its outbound writes the way you would log a pipeline's, outside MAC, before you give it write scopes.
What to do Monday morning if you run an agent fleet
Treat MAC as the inventory layer and assume the request-forensics layer is your problem. The trap is reading the launch posts and concluding governance is solved. It is not. The wiring that produces a real audit trail is still on you. The order below assumes a tenant with Copilot licenses already deployed and at least one Copilot Studio agent live.
1. Audit the registry first
Sign in as an AI Admin or Global Admin to https://admin.microsoft.com, go to Copilot > Agents > All Agents. Read the three counters at the top of the page out loud. If "Unmanaged agents" or "Agents without owners" is non-zero, that is your day-one backlog. Export the full list to Excel; per Microsoft Learn, the Excel export covers up to one minute of registry data per run, so do it once cleanly and version the file.
Expected output: a CSV with one row per agent, the four-class taxonomy, owner Entra principal, knowledge sources, granted permissions, and risk signals. Anything labeled Unmanaged or No owner gets a JIRA-equivalent ticket the same day.
2. Lock down the three levers before opening adoption
In MAC > Copilot > Settings, scope agent extensibility to a pilot security group rather than all Copilot-licensed users. Restrict allowable agent types to Microsoft-certified plus internal LOB only; defer multi-tenant ISV agents until you have an inspection process. Turn off org-wide sharing for everyone outside the pilot group.
# List only Unmanaged agents — the subset MAC flags as outside the protection envelope
# Diff this output across runs; growth in the Unmanaged set is your real signal
curl -H "Authorization: Bearer $TOKEN" \
"https://graph.microsoft.com/beta/copilot/admin/packages?\$filter=class eq 'Unmanaged'" \
| jq '[.value[] | {id, displayName, owner: .owner.userPrincipalName, riskFlags}]'
Expected output: a JSON array containing only Unmanaged packages, one object per agent, with owner UPN and risk flags. If the list is empty but the MAC UI shows agents under "Unmanaged," the token is missing the AgentRegistry.Read.All scope or the AI Admin role. Persist the file dated; week-over-week diff is the operational signal, not the absolute count.
3. Wire the approval queue to a real review
The queue is only as good as the reviewer. Define a written checklist before the first request lands: what data does the agent touch, which Entra permissions does it request beyond the caller's, what external APIs does it call, who owns it after launch. Treat the approval step as a code review, not a thumbs-up. 2toLead's 2026 governance research reports 73% of regulated-industry orgs surveyed cite governance gaps as a primary reason for paused Copilot rollouts; the four questions above are the ones their reviewers consistently could not answer for individual agents.
4. Stand up request-level logging outside MAC
This is the step the platform will not do for you. Route Copilot Studio agents that call external APIs through an AI gateway (Portkey, LiteLLM self-hosted, or an Application Gateway with logging) for the per-prompt audit trail. For agents that stay inside the Microsoft graph, the closest available primitive is Purview audit logs plus Defender for Cloud Apps activity. Neither shows the prompt body, but together they reconstruct the request envelope. Plan for a future where prompt-level logging is your responsibility, not Microsoft's.
# Example Portkey config for a Copilot Studio outbound tool
gateway:
audit:
log_requests: true
log_responses: true
redact_pii: true
budget:
monthly_usd_cap: 500
alert_at_pct: 50
routing:
primary: azure-openai-gpt-4o
fallback: claude-sonnet-4-7
Expected output: every outbound LLM call is recorded with prompt, response, latency, cost, and PII flags. You can answer "what did agent X say to user Y at 14:32?" without filing a Microsoft support ticket.
5. Bind the billing plane before turning on PAYG
Create a billing policy in MAC > Copilot > Billing, linked to an Azure subscription and resource group where you have Contributor or Owner. Set a monthly cap. Set the email alert threshold at 50% of the cap, not 80%. Enable capacity packs as the primary consumption source if you bought any (25,000 credits per pack per month), with PAYG as fallback. When April 2026's standalone credit policies are available across your tenant, switch the policies that should not exceed prepaid into hard-cap mode. The 10-policy-per-tenant cap is the binding constraint on how granular your group structure can be.
6. Plan for shadow agents you will never see
The MAC unmanaged-agent count is a floor on a category the registry can structurally see; everything in the broader unsanctioned-use population (browser ChatGPT, BYO API keys, non-Microsoft platforms) sits outside that boundary entirely. Add a Conditional Access policy that requires named service principals for any Azure OpenAI deployment in your tenant, and route DNS for known third-party LLM endpoints through a logging proxy. This is not a Microsoft feature gap; it is a category boundary. The registry can only see what it owns.
7. Re-run steps 1 to 6 every 30 days
Forrester's enterprise Copilot adoption data puts most enterprise tenants 12 to 18 months away from scaled production deployment — i.e., the gap between "Copilot licenses purchased" and "agents doing meaningful work under defensible governance." Forrester's framing is sharper than the timeline: governance determines who scales and who stalls. The registry view drifts. Ownership changes. New agent classes ship. Treat governance as a recurring batch job, not a launch task. The teams that stall in weeks 6 to 12 are the teams that did this once. Once is a launch, not a program.
Open questions
Three uncertainties sit right under the surface of what shipped, and the honest read is that none of them will resolve from public material before the end of 2026.
First, prompt-body logging. Microsoft has not said whether the per-request prompt and response payload will ever land inside MAC or Purview as a first-class log type. The current ContentFiltered telemetry suggests the architectural choice was deliberate. If it stays that way, the AI-gateway-as-second-layer pattern in step 4 becomes permanent, not transitional. The counter-bet is that regulatory pressure (EU AI Act high-risk classification, U.S. state-level audit mandates) forces the surface into the product by mid-2027. Either outcome rewrites the buy-vs-wire decision.
Second, the Entra Agent ID three-tier hierarchy under cross-tenant federation. Blueprint, Principal, and Instance are clean inside one tenant. The moment an ISV-built agent reaches from a partner tenant into yours, the identity model has to answer whose Conditional Access policies apply, whose audit log catches the event, and whose lifecycle owns the side effects. The docs cover the single-tenant case in detail. The federated case is absent. Anyone running a serious ISV ecosystem will hit this before Microsoft documents it.
Third, the Bedrock and Google Cloud agent registry sync that Microsoft announced as public preview. If it ships at GA quality, MAC becomes the de-facto cross-cloud agent inventory, and the comparison table in this post needs a rewrite. If it ships thin (read-only, no lifecycle actions, no risk fusion), the table holds. The lifecycle actions roadmap (start, stop, delete on third-party agents) is the tell. Watch that, not the marketing.
Takeaways.
- The MAC governance surface is real, internally consistent, and unusually well bound to identity and billing for a Microsoft-shipped feature this young.
- Entra-grounded permissions resolve the broad-sharing data-leak panic; agents inherit caller rights and nothing more.
- The registry inventories agents; it does not log what they did. Per-request forensics are still your job.
- Hard delete removes the agent, not its side effects. Treat any externally-writing agent as an ETL job and log its outbound writes outside MAC.
- The unmanaged-agent counter is a floor on the subset Microsoft can detect; the broader unsanctioned-use population is structurally outside the registry's view.
- For real fleets, run MAC for lifecycle, an AI gateway for per-request logs, and content guardrails as a third layer; treat them as orthogonal.
If you are running a Copilot agent fleet that has crossed the "we have a few" line into "we have a registry that scrolls," the operational questions get sharp fast. Where are your prompt logs landing? Which of your existing SharePoint permissions are about to be amplified by an approved agent next quarter? Someone is on call when the unmanaged-agent counter ticks up at 2am, and they need a runbook. I am working through the same questions with teams shipping into Microsoft tenants right now — get in touch if you want to compare notes, especially with counter-evidence on the observability gap. The fastest way to update my thinking is a concrete example of an audit your team passed using only MAC, with numbers on what your reviewers asked for and what the registry surfaced in response.