The model is not what fails first. The surface is. Auth, tenant grounding, latency, and license metering all ride the integration pattern you picked before you ever opened the prompt editor.

Key takeaways

Surface choice (iframe, self-hosted Web Chat, BYOUI, S2S middleware, Direct Line, Teams) locks auth mode, grounding, latency budget, and license meter before you write a single prompt.
The iframe embed only renders for No-auth agents and forfeits Tenant Graph Grounding, so it is a demo target, not a production one.
Tenant Graph Grounding is a property of the client transport, not the agent. Pick Direct Line and you have paid Copilot Studio licensing for a feature you can no longer reach.
The 100-second turn ceiling is the only constraint that will not negotiate. Every middleware hop, OBO exchange, and payload scan pays interest against it.
For internal employees on M365 Copilot licenses, Teams is both the cheapest surface and the most capable one; deviate only with a written reason.

Copilot Studio: most agent problems are integration problems

The pattern is familiar. A team ships a Copilot Studio agent into an iframe on the intranet, sees the magic-code login card, watches SharePoint grounding go silent, and concludes the agent is "not smart enough." They tune the prompt. They swap models. Nothing improves. Six weeks later someone re-reads the docs and notices the iframe snippet only renders when authentication is set to No authentication. The "agent quality" problem was an integration problem the whole time. I have watched this exact arc three times in the last quarter, with three different teams, on three different agents. The model was fine. The surface was wrong.

This piece is about the surface decision: iframe, self-hosted Web Chat, BYOUI, server-to-server middleware, Direct Line, native mobile, Teams, Power Pages. Eight first-class targets in the official channel taxonomy, each carrying its own auth model, grounding posture, latency budget, and license meter. Pick one and the next four decisions are already made for you.

Here's my thought: the underlying LLM is the cheapest, most fungible part of a Copilot Studio agent. The surface is the part that doesn't iterate. Whichever embed pattern you pick locks four downstream things: which auth modes you can actually offer, whether Tenant Graph Grounding works at all, your tail-latency floor against a hard 100-second turn ceiling, and which Copilot Credit meter you burn. Pull on any of these later and you don't tune, you rebuild. Most "this agent is bad" tickets are surface tickets in a costume.

A counterargument runs through this whole space: the surface is just UI, pick the easiest thing, iterate. I will resolve it in the body. Short version: surface choice is also auth choice, grounding choice, and governance choice, and those three don't iterate cheaply once the agent is live and a security team has signed off on a wire diagram.

Why the surface decision dominates

The pick order matters. Surface, then auth mode, then grounding, then latency budget, then license meter. Reverse it and you rebuild twice.

The channels guidance above lists eight publish targets. The interesting thing is what each one quietly takes off the table. The iframe embed snippet from the Copilot Studio maker disappears the moment you turn on "Authenticate with Microsoft" or "Authenticate manually" inside Publish to web channel. That is not a UI bug. That is the documentation telling you, in writing, that the easy embed is the demo embed.

The same pattern shows up on grounding. Tenant Graph Grounding, the feature that lets the agent ground answers in the signed-in user's SharePoint and Microsoft Graph connectors, is unlocked only when the client uses the M365 Agents SDK client. Direct Line cannot do it. The Microsoft-hosted iframe cannot do it. So the moment you pick a Direct-Line-based surface, you have paid full Copilot Studio licensing for a feature you can no longer reach from your UI.

Latency works the same way. Every Copilot Studio agent runs against a 100-second turn ceiling, documented in Modify a flow to use with an agent and surfaced as a flowactiontimedout error code. Agent flow express mode is Microsoft's explicit response to teams running out of budget. Add an OBO hop, a payload scanner, and a slower model backend, and you can spend that budget without the user pressing Enter twice.

And licensing. As of 2025-09-01 the meter is Copilot Credits, not messages, per the Copilot Studio billing doc. Agents authored inside Copilot Studio for Teams do not consume credits. Consumption by M365 Copilot-licensed users does not bill against standalone-authored agents either. The cheapest production surface in the entire catalog is also the most powerful one. We will come back to that.

Not "pick a surface and iterate." Surface IS auth IS grounding IS budget IS price.

The auth surface is the agent

You don't have an auth strategy. You have a surface, and your surface has an auth strategy. The two cannot be decoupled.

Consider the iframe. The Publish-to-web-channel doc only shows the embed snippet when auth is No authentication. That is not because Microsoft forgot to implement the other case. It is because the combination of an iframe origin (copilotstudio.microsoft.com) and a host page on your own origin puts the auth flow into a third-party-cookie context that Safari ITP and Brave block by default, and that Chrome is phasing toward. The Microsoft identity platform's third-party cookie guidance describes exactly what breaks. Silent token acquisition stops. The frame tries to pop a window. Conditional Access tries to evaluate the embedded app. The user gets a login card in a separate tab, and you get a support ticket titled "agent is broken."

Move to self-hosted Web Chat with the open-source botframework-webchat package and the auth story changes. Your origin owns the cookies. You broker the token server-side using the Direct Line authentication endpoint. SSO works when the user is already signed into Entra. The magic-code experience that ranks as the single most-disliked Copilot Studio UX issue in the field disappears. The minimum viable wiring is small enough to fit in one block:

import { createDirectLine, renderWebChat } from 'botframework-webchat';

// Server-mint a short-lived token from the Direct Line secret.
// Never ship the secret to the browser.
const { token } = await fetch('/api/copilot/token', {
  method: 'POST',
  credentials: 'include',
}).then(r => r.json());

renderWebChat({
  directLine: createDirectLine({ token }),
  styleOptions: { /* your brand */ },
  userID: currentUser.entraOid,
  username: currentUser.displayName,
}, document.getElementById('chat'));

Now go BYOUI with the M365 Agents SDK client and the Entra story collapses to one app registration shared across the agent and the client. The Custom Engine team's "You Probably Don't Need Manual Auth" post is the clearest piece of writing on why this matters: silent SSO, no magic codes, streaming on for Copilot Studio agents, Tenant Graph Grounding on. The price is that "Authenticate with Microsoft" becomes mandatory. Service-principal tokens are off the table. Anonymous B2C traffic is off the table. You cannot use the same BYOUI for an internal HR agent and a public consumer portal.

This is the load-bearing point. The same React chat panel, wired to two different transports, gives you two different governance postures. It is the same code, the same prompt, the same agent definition in Copilot Studio. The auth-and-grounding surface is which transport you wired up underneath the React tree. That is not "just UI."

💡 Tenant Graph Grounding is not a property of your agent. It is a property of which client you used to reach it. Pick Direct Line and you have paid Copilot Studio licensing for a feature you can no longer access.

Tenant grounding is a property of the client, not the agent

This is where the counterargument breaks. "Pick whatever surface is easiest, iterate later" assumes the surface is an interchangeable wrapper around the same agent capability. It is not. Two surfaces wired to the same Copilot Studio agent give two different answer qualities, because grounding lives at the transport layer.

When we embed the agent through the M365 Agents SDK client, or consume it inside Teams, M365 Copilot, or Power Pages with Entra auth, the agent grounds in the signed-in user's SharePoint and Microsoft Graph data, scoped by their existing M365 role-based access. The M365 Copilot architecture overview describes how the user identity carries through to retrieval. If you embed the same agent through Direct Line, including the default WebChat iframe and stock botframework-webchat, that grounding is gone. The agent will fall back to its public knowledge sources and any explicit data sources you wired into the agent itself, but it will not consult the user's SharePoint.

Why this matters in practice: the typical reason an enterprise pays for Copilot Studio in the first place is the SharePoint and Graph grounding story. We wrote the check expecting "ask the agent about the Q3 OKR doc and it will read it." Wire that agent into a Direct Line iframe on a portal page and the demo fails on the first question. The model is fine. The prompt is fine. The agent is reaching the right tools. The user just is not authenticated in a way that lets the retrieval layer see them.

The counterargument also assumes we can swap surfaces cheaply once we see the problem. We cannot. The auth wiring includes an Entra app registration, an OAuth connection in Copilot Studio, redirect URIs that need to match across environments, and, if you are using OBO, a custom-connector OBO flow with delegated permissions and Key Vault secrets. Every one of those is a ticket. Every one of those will involve at least one person who is not in your team. I have watched a "let's switch from iframe to SDK client" project take six weeks for what looked like a one-day refactor, because the Entra app registration was owned by a different org and the redirect-URI change touched Conditional Access.

This is also why I keep coming back to the line from your agent's identity is a Postgres role. The agent's effective permissions are not declared in the agent. They are inherited from the identity the surface presents. Change the surface, change the identity, change the permission set, change the answer.

Latency, timeouts, and the 100-second wall

Every additional hop pays interest against a hard ceiling. The 100-second turn limit does not negotiate, and the optimization stories are real but small.

The Optimize agents to minimize latency guide cites concrete numbers worth pinning. Splitting one giant knowledge source into focused sources with metadata filters dropped retrieval from 8 seconds to under 2. Cutting a 6,000-token system prompt to 1,200 tokens shaved 3 seconds off a turn. A manufacturing customer split 30,000 manuals across 12 product-specific knowledge sources and cut retrieval latency roughly in half. These are useful gains. They are also, end-to-end, on the order of 5 to 10 seconds. They do not buy you a free OBO middleware hop.

Compare hop counts, roughly. The iframe is browser to Microsoft-hosted Web Chat to agent: two hops. BYOUI through the M365 Agents SDK client is browser to SDK to agent: two hops, with streaming. Server-to-server middleware with PII redaction is browser to your API to OBO exchange to Direct Line or SDK to agent: four hops minimum, more if you are scanning payloads in both directions. Direct Line for non-UI server callers has its own tax: there is no end-of-response signal, so a server caller has to poll the get activities endpoint and decide for itself when the agent is done responding, as the Direct Line API 3.0 reference makes painfully clear. Community reports also peg Direct Line softening around 2 requests per second, beyond which 429s start arriving with Retry-After.

A practical sketch. Internal customer-support agent. Authenticated employees on the corporate network. SharePoint grounding required. 30-second average retrieval, 10-second average reasoning, 3 seconds of UI render. That is 43 seconds median, with a fat tail. Now wrap it in a middleware that runs an LLM-based PII classifier on every inbound and outbound message. The classifier averages 4 seconds. Add a 1-second OBO exchange on cold turns. Suddenly the p95 is at 70 seconds and your p99 is hitting the wall. The model did not change. You added two hops.

A reasonable rule of thumb for Copilot Studio: keep the middleware path under 90 seconds at p95 against a 100-second hard ceiling, leave 10 seconds for the agent's own slack. If the agent uses connected models with slower response profiles, headroom shrinks further: a participant in a recent Microsoft session on Copilot Studio integration raised that Anthropic-backed agents hit the ceiling more often, and the presenter did not dispute it. Autonomous agents face the same ceiling as conversational ones, contrary to common assumption.

This is also why I argued in temperature zero will not save you that the pass condition for an agent has to be a distribution, not a single number. Tail latency is the part that breaks contracts, and tails are inherently distributional.

A worked decision: a regulated internal HR agent

Concrete case. A financial-services org wants an internal HR agent on a SharePoint portal. Employees only, all on Entra. Needs to ground in SharePoint policy docs. The legal team requires every inbound message to be scanned for PII before it reaches the LLM, and every outbound message to be scanned again before it reaches the user. Logs to a SIEM. SOC-2 in scope.

Walk the surfaces:

| Surface | Auth | Grounding | Middleware | Verdict | |---|---|---|---|---| | Iframe embed | No-auth only | None | Impossible | Out. Fails SOC-2. | | Self-hosted Web Chat + Direct Line | Entra SSO | No Tenant Graph | Possible at your origin | Possible, loses grounding. | | BYOUI + M365 Agents SDK client | Entra SSO | Yes | Possible at your origin | Strong candidate. | | BYOUI + Direct Line | Entra SSO + SP | No Tenant Graph | Yes | Loses grounding. | | Server-to-server middleware (S2S) | Entra app identity | Via SDK path only | Designed for this | Best fit for the redaction requirement. | | Teams + M365 Copilot | Zero-config SSO | Yes, free | No middleware seam | Cheapest, but no PII redaction. | | Power Pages | All modes + web roles | Yes for Entra users | Limited | Wrong audience: employees, not portal users. |

The choice that survives the table is S2S with the upcoming M365 Agents SDK client app-only auth described in the same Microsoft session referenced above. The agent gets shared with an Entra app registration that holds the CopilotStudio.Copilot.Invoke application permission, an admin consents, and your middleware authenticates via OAuth2 client credentials. Users authenticate to your portal in Entra. Your middleware sees their identity, runs the PII redaction in both directions, and presents an app identity to Copilot Studio. Conditional Access still applies to the app identity. Sign-in monitoring still works. You can IP-allowlist the middleware so only your server can invoke the agent.

The one thing this architecture cannot do today is Tenant Graph Grounding scoped to the calling user, because the SDK client is running with an app identity instead of a delegated user token. For an HR agent grounded in tenant-wide policy docs that everyone is allowed to read, this is fine: the agent reads from a known SharePoint site, not from the calling user's personal Graph. For an agent that needs to ground in the calling user's mailbox or OneDrive, S2S with app identity is not the right tool. That case forces a hard tradeoff: either we accept that personal-graph grounding is incompatible with strict middleware redaction and pick one, or we move redaction inside the agent itself behind a dedicated guardrail model and document a compensating control for the SOC-2 reviewer. There is no clean third option today.

If the same org instead asked for a public customer-facing agent on a marketing portal, every part of this analysis flips. Anonymous traffic kills the M365 Agents SDK client. PII redaction stays. The answer becomes Direct Line behind your own middleware, accepting the loss of Tenant Graph Grounding because there is no signed-in user to ground for. Same agent, different surface, different architecture.

How to apply this

Five rules that hold across the surfaces I have used.

Decide the surface before you write the prompt. Auth mode, grounding, latency budget, license meter. Once those four are pinned, prompt iteration is the cheap part. Doing this in the wrong order is the most common failure mode.
Treat the iframe embed as a demo tool, not a production target. The combination of no-auth-only embed snippets, third-party cookie blocking, and missing Tenant Graph Grounding makes it structurally unfit for anything carrying tenant data. Use it for marketing chatbots, use it to demo the agent to a sponsor, do not use it for the actual rollout.
Reach for self-hosted Web Chat before you reach for BYOUI. The open-source botframework-webchat package absorbs about 80% of what most teams think they need a fully custom UI for. styleOptions covers the brand. Event listeners let you wire it into ServiceNow or SharePoint pages. BYOUI is the right answer when you genuinely need to surface tool-call internals or fold the agent into an existing chat framework. It is the wrong answer when you just want different colors.
For regulated middleware, design around the 100-second wall, not the model. Hop budget, OBO token lifetimes, polling latency on Direct Line, end-of-response detection. The model is the part that gets cheaper every six months. The wall does not.
For internal employees on M365 Copilot licenses, default to Teams unless you have a written reason not to. Zero-config SSO, Tenant Graph Grounding on by default, favorable Copilot Credit treatment, governance through the Admin Center. Most "we need a custom UI" requirements survive contact with a well-designed Teams app card, and the ones that don't survive are usually marketing requirements, not engineering ones. I argued the broader version of this in Microsoft 365 ships agent inventory not observability: the governance surface is real, the observability surface is missing, but the publishing surface is excellent.

Open questions

I expect parts of this to age badly. The places I am least sure:

The S2S app-identity flow described above is still rolling out. I have built against the existing Direct Line app-token flow, but the full M365 Agents SDK client variant with CopilotStudio.Copilot.Invoke as an application permission is upcoming, per the Microsoft session above. By the time you read this, the docs may exist, or the permission name may have changed, or the sharing experience may not yet let you select an app ID. Cross-check before you commit an architecture.

I have under-tested the Power Pages embed for mixed anonymous-plus-authenticated audiences. The Microsoft blog post on the integration claims Power Pages supports all Copilot Studio auth modes, including the no-auth one, which makes it the strongest candidate for portals that need both. I have not stress-tested whether web-role gating composes cleanly with Tenant Graph Grounding when the user is signed in via Entra. If you have, I want to hear about it.

I am uncertain how much longer Direct Line stays a first-class option. The trajectory of Microsoft's M365 Agents SDK suggests the SDK client is the long-term path for everything except service-principal and B2C cases. If Direct Line ends up deprecated within 18 months, every "S2S over Direct Line" pattern that exists today becomes migration debt. I would not bet against that outcome.

And I am still not happy with the observability story across these surfaces. The Optimize-agents doc gives you the levers. It does not give you per-turn traces with hop-level latency attribution. If you have built that, even badly, I would like to see the schema. The line I drew in models depreciate, eval suites compound applies here too: the suite that watches the agent is the durable asset. The surface is not.

If you are running into a version of this: I work with teams shipping production AI systems against problems like this, and surface-axis decisions are the single most common place I see them get stuck. Get in touch. If you have already shipped a Copilot Studio agent into one of these surfaces, I especially want to hear about it — the fastest way to update my thinking is a concrete counterexample with a wire diagram.