All notes

Dataverse is the agent backend most M365 shops already own

Dataverse is the agent backend most M365 shops already own

Before any jargon: the question I want to put on the table is where your AI agent reads its data from. That sounds like a developer detail. It is not. It is a procurement decision, an audit decision, and a security decision, and most companies make it once and live with it for years. If you are already on Microsoft 365, the answer is probably sitting in a tab you have never opened.

Key takeaways

  • For an M365 shop, the agent backend is a procurement question first, not a vector DB benchmark question.
  • Dataverse ships row-level and column-level security as a table primitive, no per-query tenant filter required.
  • Adding a Dataverse table to Copilot Studio Knowledge forces Entra authentication; "no auth" is unsupported by design.
  • Dataverse storage at roughly $40 per GB per month is a signal, not a flaw: put PDFs in SharePoint, typed rows here.
  • The thesis loses on unstructured corpora, high-QPS public search, and any shop not federated to Entra ID.

In this article

The boring question that decides the next two years

Every time I sit down with a small or mid-sized company starting on AI agents, the same conversation happens. Someone has read three Substack posts about vector databases. Someone else has watched a YouTube comparison of Pinecone and Qdrant. A third person has a price list from Weaviate Cloud. And then they ask me which one they should pick.

My answer almost always disappoints them. Not because the vector DB question is uninteresting, but because for an SMB already paying Microsoft for everything else, it is the wrong question. The right one is shorter: what data plane has the agent already got access to, and which auditor signs off on it.

The Microsoft Power Platform team made roughly the same case in a recent webinar called "When Copilot Studio meets Dataverse: Supercharged AI Knowledge" (video). I am not going to recap their session. The retrieval-quality angle is already covered in my earlier post on structured retrieval beating vector RAG, drawn from a sister upload of the same talk. This post takes a different angle: not which retrieval is more accurate, but which retrieval is already on your invoice and inside your auth boundary.

The two angles are complementary. Post 05 argues why structured retrieval wins on accuracy and governance. This post argues why, for SMBs on M365, you do not get to pick a different backend even if you wanted to, because the procurement gravity and the audit story have already chosen.

What "the backend you already own" actually means on the invoice

Let me state the procurement claim plainly. If your company runs on Microsoft 365 and has any combination of Power Apps Premium, Power Automate Premium, or Dynamics 365 seats, you have Dataverse capacity on your bill. You have probably never looked at it. Most operations leaders I talk to are mildly surprised when I open the Power Platform admin center on their tenant and show them the capacity tile.

Specifically, the default environment ships with 3 GB of Dataverse database capacity, 3 GB of file capacity, and 1 GB of log capacity per tenant. Power Apps Premium and Power Automate Premium added 20 GB of database storage per tenant in the December 2025 capacity increase. On April 15, 2026 the per-user accrual on relevant SKUs doubled from 250 MB to 500 MB per seat. Dynamics 365 Sales Premium tenants moved from 30 GB to 45 GB of database and 40 GB to 60 GB of file capacity in the same release (source).

There is a caveat worth saying out loud. A bare Microsoft 365 E3 or E5 license without any Power Platform Premium or Dynamics 365 seat gives you only Power Apps for Microsoft 365 with standard connectors. The Power Platform licensing FAQ is explicit: no per-user Dataverse database capacity is allocated by M365 E3 or E5 alone. The Dataverse entitlement starts at Power Platform Premium or any Dynamics 365 seat. If your shop is M365-only, the procurement question is whether one Power Apps Premium seat is cheaper than a Pinecone subscription. It almost always is.

The Monday move: open the Power Platform admin center, click the capacity tile, and write down two numbers. Database used, file used. If they are zero or near zero on a tenant with paid Power Platform seats, you have a backend you have been paying for and not using.

Row-level security is the part the auditor can read out loud

The more interesting argument is not the bill. It is the authorization boundary.

When you add a Dataverse table as a knowledge source in Copilot Studio, the official docs are explicit: the agent must be set to authenticate with Microsoft Entra ID. "No auth" and "manual auth" are listed as unsupported. That sounds like a restriction. It is actually the load-bearing feature.

Here is why. When an Entra-authenticated agent queries Dataverse on behalf of a signed-in user, the row-level security policy on the table is enforced inside Dataverse. Rows the caller is not entitled to see never enter the retrieval result set in the first place. The agent does not filter them out after the fact. They are not returned.

💡 Row-level security is a posture, not a query. Dataverse refuses to return the row. A vector DB returns it and asks you to remember to filter. Those are not the same audit story.

The vector DB world reaches similar outcomes with different scaffolding. Pinecone uses project-scoped API keys and namespace isolation, with up to 100,000 namespaces but only 20 indexes per standard plan; hard tenant isolation requires their BYOC mode running the cluster in your own cloud account. Qdrant ships first-class multi-tenancy through named collections with quota controls and tenant-level lifecycle APIs. Weaviate's multiTenancyConfig isolates each tenant in its own HNSW shard. Azure AI Search ships Entra-backed access with EU region support and hybrid BM25 plus vector plus semantic ranker in a single query, which is the closest analogue to Dataverse's identity story inside Azure.

All of them work. The substantive difference is the failure mode. In every vector DB option, the developer has to remember to filter by tenant on every query. Forget once, leak once. In Dataverse, the database refuses to return the row. The auditor reads one access control list, not two. For a 60-person company without a dedicated security engineer, "refuses to return" beats "remember to filter" every single day of the week.

The Monday move: ask your IT lead which surface a Data Protection Officer would have to read to confirm who can see which rows. If the answer is "two surfaces and a Python file," you have a problem the database refuses to return for you.

What Copilot Studio actually does with a Dataverse table

Three retrieval surfaces sit on top of the same Dataverse rows. The Microsoft Power Platform team walked through all three in their session. I am going to summarize how I describe them to clients, because the choice between them is where most agent projects either work or wander into the weeds.

The first surface is Knowledge. You add a table, you write column synonyms and a glossary, and the orchestrator generates structured queries from natural language. Adding a table to Knowledge also triggers Dataverse indexing under the covers immediately, which is what makes the second surface feasible.

The second surface is structured tooling, primarily the List Rows connector action. You describe the OData filter intent in plain language inside the tool input description, the orchestrator generates the syntax. This is the deterministic path that returns complete result sets and that, unlike Knowledge, supports anonymous public agents.

The third surface is the Dataverse MCP server, which entered public preview in October 2025 and reached GA in March 2026. It exposes Query, Knowledge-and-search, Upload, and Generate-with-grounding as MCP primitives reachable from Copilot Studio, VS Code Copilot, Claude desktop, and Claude Code. It is the autonomy lever: pay with control, gain with flexibility.

A minimal Dataverse knowledge configuration looks like this in practice:

agent:
  authentication: entra_id
  knowledge_sources:
    - type: dataverse_table
      table: facility
      synonyms:
        district:
          - "north"
          - "south"
          - "west"
          - "east"
          - "central"
        facility_type:
          - "community center"
          - "civic center"
          - "service point"
      glossary:
        district: "Geographic service zone, one of five fixed values."
        facility_type: "Category of the physical site providing services."
    - type: dataverse_table
      table: service_offering
      relationship:
        with: facility
        cardinality: many_to_many
tools:
  - name: emergency_facility_lookup
    action: dataverse.list_rows
    table: facility
    filter_hint: |
      Filter where district equals the user's mentioned direction.
      Filter where service_type contains "emergency" if the user
      mentions urgency, after-hours, or escalation.

Two patterns to notice. First, the tool is named by business function ("emergency_facility_lookup"), not by connector name ("dataverse_list_rows"). The orchestrator routes intent on the tool name, so a generic name degrades routing. Second, the filter hint is pseudo-code in natural language, not OData syntax. The orchestrator generates the OData on the fly. The maker no longer has to know whether the operator is eq or =.

The Monday move: pick one table you already have in Dataverse, name a tool after the question your users actually ask, and write the filter hint as a single English sentence. That is the entire onboarding ritual.

An 80-person Mittelstand scenario, Monday morning

Let me put a face on this. A medium-sized logistics firm in the Ruhrgebiet, 80 employees, runs on Microsoft 365 E3 with 25 Power Apps Premium seats for dispatchers and 15 Dynamics 365 Sales seats for the account managers. They have a Data Protection Officer but no full-time security engineer. The IT lead is a 50-percent role; the other 50 percent is operations.

They are about to procure a chatbot for internal use. The dispatchers want to ask, in plain language, things like "which carriers are cleared for hazardous goods to Austria this week" and "what is the on-time rate for FreightCo on the Munich corridor last month." The vendor pitch on the table is a Pinecone subscription plus a custom Python backend, billed at a four-figure monthly recurring number plus a fixed integration fee (illustrative; the exact figure is irrelevant to the argument).

I would block that procurement and run the following experiment first. Take the existing carrier-and-route table in Dynamics 365, add it to Copilot Studio Knowledge with synonyms (carrier = haulier = Spediteur, on-time rate = OTR, hazardous = ADR), expose one List Rows tool called "carrier_clearance_lookup" with a filter hint in natural language, set the agent to Entra authentication. Cost on top of existing seats: zero. Time to a working POC: half a day.

The measurable outcome is not vibes. It is one number: percentage of dispatcher questions answered correctly on the first turn over a two-week pilot, measured against the existing manual workflow. If that number lands above 70 percent, the Pinecone procurement is dead because nothing in it is better-correlated to the dispatchers' decisions than the existing typed data they already trust. If it lands below 70 percent, the failure modes are diagnosable in the chain-of-thought activity log inside Copilot Studio, and the next step is either better synonyms, a Search Query unbound action for fuzzy matching, or a deliberate move to a hybrid retriever like Azure AI Search.

The Data Protection Officer signs off on one Dataverse security role rather than reading both an Entra group structure and a Pinecone namespace policy. That is not a small saving. That is the audit story for the next three years.

The Monday move for this firm: the IT lead opens the Power Platform admin center, confirms the capacity tile shows headroom, and books 90 minutes with the operations lead to pick one table and one tool name. The vendor procurement sits in the inbox until the experiment has a measured number.

Where the thesis loses

I have shipped systems on the wrong side of this argument before, so I will name where the case breaks honestly.

It breaks on unstructured corpora at scale. If your problem is searching 50,000 PDFs of technical documentation with no useful schema, Dataverse is the wrong tool. The right move there is SharePoint as the grounding source, which I have written about in the SharePoint-as-grounded-RAG-layer post. SharePoint and Dataverse are complementary, not substitutes. Typed rows belong in Dataverse, document corpora belong in SharePoint, and Copilot Studio Knowledge Center reaches both.

It breaks on high-QPS public-facing semantic search. If you are running an e-commerce site that needs sub-100ms semantic search across a million product descriptions at thousands of queries per second, you want Azure AI Search or a dedicated vector DB. Dataverse was not built for that load shape.

It breaks if your shop is not federated to Entra ID. The whole authorization argument collapses without Entra identity passing through. If you are a non-M365 shop or your customer-facing identity provider is Okta or Auth0, the Entra-inherited row-level security story is not available to you, and you should pick a retriever whose identity story matches yours.

It breaks at the storage-cost ceiling. Dataverse database storage costs around $40 per GB per month, and that price is a deliberate signal. SharePoint and Azure Blob Storage are up to 200 times cheaper than Dataverse database storage for file data, as low as $0.20 per GB per month on add-on file storage. If your team is using Dataverse as a file dump, you are misusing the architecture. Move the files to SharePoint or Blob and keep typed business rows in Dataverse. The pricing is the design.

It breaks on autonomy boundaries. The Dataverse MCP server is treated as one connector for Data Loss Prevention purposes, with no per-table granularity in DLP policies. If you need to expose some tables to an agent but block others through DLP, MCP is too coarse. You will end up writing specialized tools per business function rather than using the MCP umbrella.

Naming these is not hedging. It is the part of the design conversation where you decide whether the SMB default applies to you, or whether your workload is genuinely off the curve.

What to do on Monday

If you are running an SMB inside M365 and considering an AI agent project, here is what I would do Monday morning, in order.

Open the Power Platform admin center, click capacity, write down the database and file used numbers. That is your starting baseline.

Ask the IT lead which Power Platform seats are already paid for. If the answer includes any Power Apps Premium, Power Automate Premium, or Dynamics 365 seat, you have a Dataverse entitlement. If the answer is bare M365 E3 or E5, price one Power Apps Premium seat against whatever vector DB is on the procurement list.

Pick one table that maps to a question your team asks daily. Not a strategic table, the daily question table. Add it to Copilot Studio Knowledge with five or ten synonyms and a one-paragraph glossary. Confirm the agent is set to Entra authentication. Test it with three real questions.

If the answers are right more than 70 percent of the time, that is your shipping baseline. If they are wrong more often than that, the chain-of-thought activity tab in Copilot Studio will tell you whether the problem is missing synonyms, missing filter hints, or a genuine need for fuzzy search via the unbound Search Query action.

Do not buy a vector DB on the strength of a benchmark before you have measured what the database you are already paying for can do.

I have run versions of this play with SMB clients on M365 and the consistent pattern is that the first POC takes half a day, the first measurable result lands in two weeks, and the procurement conversation that was going to take three months gets resolved in a single capacity-tile screenshot. If you are in the middle of a vector DB procurement and your team is already on M365, I would be happy to compare notes on what your capacity tile is telling you. The data plane decision is the kind of thing that benefits from a second set of eyes before the contract gets signed.