The "pgvector doesn't scale" meme was true in 2023, half-true in 2024, and is now mostly a function of not reading the 0.8.0 changelog. While the dedicated-vector-DB crowd kept rehearsing benchmarks from a two-year-old commit, Microsoft quietly shipped a first-party MCP server for Postgres inside Foundry, made pg_diskann generally available, and previewed Azure HorizonDB at Ignite 2025. None of those announcements got the keynote slot they deserved.

Here is the thesis I want you to argue with: Postgres has quietly become the agent data plane, and the interesting question for 2026 is not "which vector DB" but "what gets to live inside the same transaction as your agent's actions." Once you frame it that way, most of the polyglot stacks the AI infrastructure crowd has been selling for two years start to look like premature optimization.

Your agent's identity is a Postgres role

The shift nobody announced

The cleanest signal of where this is going showed up in a seven-minute demo at Build 2025. Abe Omorogbe, on the Azure PostgreSQL team, walked through Microsoft Foundry's native integration with managed Postgres in his session PostgreSQL Like a Pro: Build smart and secure agents. I went into it expecting a slick first-party SDK reveal. What I actually saw is a quieter and more important architectural decision, and the second time I watched the demo I started rewriting my recommendation for new agent projects.

What Foundry actually ships is an MCP server for Postgres, packaged as an Azure Container App, that an agent calls through a standardized tool surface. The agent gets schema introspection, parameterized queries, and write tools through the Model Context Protocol. That sounds mundane until you notice what it replaces. For the last eighteen months, the OSS community has been hand-rolling LangChain SQL toolkits, custom function-calling shims, and brittle "describe-your-schema-in-the-system-prompt" tricks. Microsoft just productized the pattern and called it done.

Think of MCP as ODBC for agents. The protocol surface lives next to the data, not next to the model. Every database vendor that wants to be agent-native in 2026 will ship one. The reason Postgres got there first on a hyperscaler is not technical superiority. It is that GPT-4 has read more Postgres than any other dialect, and Microsoft would rather you stay inside the same network boundary as the rest of your data than ship rows over the wire to a separate vector service.

Three other talks in the same "PostgreSQL Like a Pro" series, plus the HorizonDB Ignite reveal, make the bet explicit. Microsoft's 2025 Postgres roadmap names the goal: Postgres as the canonical data plane for agents on Azure, with DiskANN, Entra ID auth, the azure_ai extension, and HorizonDB stacked on top.

What pgvector quietly fixed

For two years the standard objection to running vector search inside Postgres was a benchmark that everyone repeated and nobody re-ran. The story went: pgvector falls over past five million rows, filtered search collapses recall, and index builds eat all your RAM. By late 2024 each of those critiques had a real fix landed in main.

The biggest one is iterative index scans, released in pgvector 0.8.0. Before 0.8, the post-filter failure mode looked like this: you ask the index for the top 10 nearest neighbors, then apply a WHERE tenant_id = $1 filter, and the result set shrinks to three rows because seven of the nearest neighbors belonged to other tenants. The fix is hnsw.iterative_scan and ivfflat.iterative_scan. The planner keeps fetching candidates until your filter is satisfied or a configurable cap is hit. The "we asked for 10, got 3" cliff is gone.

The other quiet upgrade is halfvec. Storing embeddings as 16-bit floats instead of 32-bit doubles the dimension cap on HNSW and IVFFlat indexes from 2,000 to 4,000 with effectively zero recall impact for OpenAI-class embeddings. If you have not migrated your 1,536-dim or 3,072-dim columns yet, you are paying double the storage and index size for no measurable accuracy gain.

The last critique worth retiring is the index-build memory story. Yes, building an HNSW index on 5M × 1536-dim vectors needs 8 to 16 GB of maintenance_work_mem or the build will run 10 to 50x slower on disk, a tradeoff Nile's pgvector deep-dive measured concretely. The honest answer is: set it correctly once, build once, and move on. The New Stack's methodology critique of pgvector benchmarks is the required reading here. Most of the "pgvector is slow" posts misconfigure maintenance_work_mem and ef_construction, then publish QPS numbers as if those are defaults. I have seen two consulting clients spend a quarter shopping for a vector DB to escape a problem that one Postgres config change would have fixed in an afternoon.

The contrarian summary: most of the strongest critique of pgvector was written against behavior that no longer exists. It is still worth reading. It is no longer worth quoting without a footnote.

One more thing worth retiring: the assumption that hybrid search forces you off Postgres. The sparsevec type plus a tsvector column gives you BM25 and dense-vector retrieval in the same query plan, with the same WHERE predicates, and the same RLS policies. Nile's pgvector myths post walks through the pattern in detail. The two-system tax most teams pay for "real" hybrid search is, in 2026, almost always optional.

DiskANN vs HNSW: the real tradeoff

Azure's headline number for pg_diskann is "up to 10x faster, 4x cheaper, 96x lower memory than pgvector HNSW." That is a marketing claim, and like most marketing claims it is technically true under one workload and misleading under most others. The actual tradeoff is more interesting than the slogan.

DiskANN is the same ANN algorithm that powers Bing and M365 Copilot vector search. It keeps a small in-memory cache and lets the bulk of the graph live on NVMe. HNSW, by contrast, expects to fit the whole graph in RAM. Once you accept that framing, the comparison is not "which one is faster" but "which one matches the cost shape of your workload."

| Axis | pgvector HNSW | pg_diskann (Azure) | |---|---|---| | Where the index lives | RAM | NVMe with small RAM cache | | Practitioner p95 at 99% recall | ~12ms (M=48, ef=96) | ~15ms | | Memory for 50M × 768-dim | ~55 GB RAM | ~16 GB RAM + 350 GB NVMe | | Build memory | 8-16 GB maintenance_work_mem | Similar, less sensitive | | Dimension cap | 2,000 (4,000 with halfvec) | 16,000 from v0.6 | | Availability | Upstream, every Postgres | Azure Flexible Server only | | Upstream target | Already there | Targeted for PG18 | | Best fit | < 20M vectors, latency-bound, RAM-rich | 20M+ vectors, cost-bound, NVMe-rich |

Read the numbers honestly. HNSW is faster on raw latency in the practitioner Medium benchmark cited in the wider research. DiskANN wins on memory cost. If you are running on a hyperscaler where RAM is the dominant line item on your invoice, the "96x lower memory" claim translates to real money. If you have a dataset that fits in RAM and you care about p95 first, HNSW is still the right default.

There is one more axis worth naming: filtered vector search. HorizonDB previewed predicate pushdown into the DiskANN index, which is the architectural move that finally closes the gap with Qdrant on filtered-query latency. Whether that ships in GA the way it demoed is the bet to watch in 2026.

Identity is the lever: agent equals Postgres role plus RLS

This is the part of the story that most agent tutorials skip, and it is the part that matters most for anyone running multi-tenant workloads. The Foundry demo shows the right pattern almost by accident. When Abel first asks the agent to list tables, it returns "no tables" because the MCP server's Postgres role has no SELECT grants yet. That is not a bug. It is the entire security model condensed into one error message.

Read that interaction again. The agent has a model, a tool, and a connection string. It still cannot see a single byte of data until a human grants the role the right to read a specific table. There is no separate agent-permissions service to misconfigure, no IAM policy living in a different cloud console, no JSON role definition drifting out of sync with what is actually in the database. The permission boundary and the data live in the same place, and you can audit it with one SELECT * FROM information_schema.role_table_grants.

💡 The Postgres role is the agent's identity. Stop bolting RBAC on top of the agent framework. Collapse it into the database role, and audit, revoke, and rotate live in pg_roles instead of in your agent platform's config UI.

The canonical multi-tenant pattern, codified in Supabase's RLS guide and reinforced by Supabase's agent best-practices post, looks like this:

-- 1. Every tenant-scoped table carries a tenant_id.
ALTER TABLE products ADD COLUMN tenant_id uuid NOT NULL;
ALTER TABLE product_embeddings ADD COLUMN tenant_id uuid NOT NULL;

-- 2. Enable RLS and write a policy keyed on a session GUC.
ALTER TABLE products ENABLE ROW LEVEL SECURITY;
ALTER TABLE product_embeddings ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON products
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

CREATE POLICY tenant_isolation ON product_embeddings
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- 3. Set the GUC at the start of every agent session.
SELECT set_config('app.tenant_id', $1, true);

-- 4. Connect through a least-privilege role, never the service role.
CREATE ROLE agent_runtime LOGIN PASSWORD :'pw';
GRANT SELECT, INSERT ON products, product_embeddings TO agent_runtime;
-- Crucially: do NOT grant BYPASSRLS.

-- 5. Vector indexes live on the same table the policy guards.
CREATE INDEX ON product_embeddings
  USING hnsw (embedding halfvec_cosine_ops)
  WITH (m = 16, ef_construction = 64);

That is the entire isolation story for a multi-tenant agent on Postgres. Five steps, one extension, zero new services.

Now the footgun. MCP servers that connect with service-role credentials silently bypass RLS. Most agent tutorials, including some official ones, hand the MCP server a superuser-equivalent connection string because it is the path of least resistance. The Supabase MCP docs warn about this. Most others do not. If your agent uses a service-role key to call your MCP server, your RLS policies are decorative. They will pass tests, fail audits, and leak data across tenants the first time an agent generates a query that omits the tenant_id predicate.

Two adjacent footguns worth knowing. First, SECURITY DEFINER functions bypass RLS unless they re-check the policy explicitly, so any stored procedure your agent calls becomes a potential cross-tenant leak. Second, PgBouncer in transaction-pool mode can leak the app.tenant_id GUC across sessions because the pooled connection keeps its session state between checkouts. The fix is either to use session-pool mode, or to enforce DISCARD ALL on every checkout, or to set the GUC with set_local inside a wrapping transaction so it dies with the commit. Pick one and write it down. Both of these have bitten teams I trust, and both happen silently. RLS does not throw an error when it is bypassed. It just returns rows it was not supposed to.

The fix is boring and load-bearing: create a dedicated, least-privilege role for the MCP server, never grant BYPASSRLS, and make the agent's session start with set_config('app.tenant_id', ...) as a non-negotiable prelude to any query.

When Postgres still loses

The honest version of this argument has limits, and skipping them turns a useful framing into a sales pitch. Here is where Postgres as the agent data plane is the wrong call.

➡️ Past roughly 100M vectors. Timescale's pgvectorscale benchmark gets to 50M Cohere vectors with 28x lower p95 and 16x higher QPS than Pinecone's s1 tier at 99% recall, but that is Timescale's number against Pinecone's slowest tier. Past 100M, dedicated vector stores designed for billions earn their keep, and the operational simplicity argument starts losing to the "I need this query in 8ms or my UX breaks" argument.

➡️ Extreme write throughput on vector columns. HNSW index inserts are expensive. If you are ingesting tens of thousands of new embeddings per second sustained, you will spend more on maintenance_work_mem and lock contention than you save on architectural simplicity. A purpose-built vector DB with async index updates will beat you on cost per write.

➡️ Multi-region active-active. Postgres logical replication has known sharp edges, and Aurora-style multi-master is not a free upgrade. If your agent needs sub-50ms latency from three continents simultaneously, the polyglot stack with a globally-distributed vector service plus a regional Postgres write-leader is still the safer bet. HorizonDB's sub-ms multi-zone commit narrows the gap within a region. Cross-region is still an open problem.

➡️ In-DB LLM calls as a hot-path dependency. Azure's azure_ai extension lets you call OpenAI from a SELECT statement. That is a great pattern for batch backfills of embeddings. It is a dangerous default for anything user-facing, because you have just coupled your database transactions to OpenAI's availability and tail latency. Use it for ETL. Do not use it inside a request path that has an SLO.

None of these limits change the thesis. They sharpen it. Postgres is the right default for the long tail of agent workloads. The exotic cases still need exotic infrastructure, and pretending otherwise is how a "simple stack" decision in week one becomes a six-month migration in year two. The honest test is: do you actually have the scale, the write rate, or the geography that breaks the one-DB story, or are you optimizing for a problem you do not yet have?

How I would build it Monday morning

If you handed me a greenfield agent project today and asked for the boring, defensible stack, this is what I would write down on the back of an envelope:

✅ Managed Postgres 17+ with pgvector 0.8 and halfvec columns. Pick pg_diskann if your data lives on Azure and your dataset is past 20M vectors; otherwise HNSW with m=16, ef_construction=64 is the sane default.

✅ A dedicated agent_runtime Postgres role per tenant, with explicit per-table GRANTs, no BYPASSRLS, and a connection-pool config that enforces RESET ALL on checkout.

✅ RLS policies on every tenant-scoped table, including the embedding table. Set app.tenant_id via set_config as the first statement of every agent session.

✅ MCP server in front of the database, running as a sidecar or container service, connected with the least-privilege role above. Treat the MCP server's connection string like a production secret, not a demo placeholder.

✅ Embedding generation in a background job, not in a request handler. Use azure_ai or a dedicated worker, your call. Keep the request path free of synchronous LLM dependencies.

✅ An audit log table written in the same transaction as the agent's action, with the tenant_id, agent id, tool name, and full query text. This is the single most valuable agent governance telemetry you will own, and it costs nothing to capture inside the ACID umbrella you already have.

❌ Do not ship vectors to a separate service before you have shipped the product. The polyglot stack is the second system you build once the first one tells you exactly where it hurts.

That stack will not win every benchmark. It will win most arguments about operational cost, and it will let you make every agent action atomic with the audit record that justifies it. The one-DB story is not romantic. It is just very hard to beat on the axes that matter once a system has users.

I have been running variations of this pattern across client work for the last six months and have opinions about what breaks in production that I would rather not write into a blog post. If you are designing the same thing, happy to compare notes and share what has and has not worked.

In this article