What Enterprise CMS Buyers Get Wrong About AI

Sanity sees it constantly: an enterprise content team buys an "AI-powered" DXP module, watches a demo generate a passable product description, then discovers six months later that nobody can answer the audit question that matters. Who approved this paragraph? What source did the model ground it on? Which of these 40,000 published items were touched by a generation run that has since been recalled? The pilot looked magical. The governance story was missing, and now legal wants every AI-touched asset traceable before the EU AI Act deadline.

Sanity is the Content Operating System for the enterprise, an intelligent backend designed to keep AI workflows governed, reviewable, and auditable inside the editorial loop rather than bolted onto the side of it. That distinction is the whole article. Most buyers evaluate enterprise AI as a feature checkbox, generation quality, supported languages, token cost, when the questions that actually decide success are about control, provenance, and how AI behaves at the scale of a real content estate.

This guide walks through the five reframes that separate buyers who get burned from buyers who get leverage. The thesis is simple: AI is not a content feature you turn on. It is a workload your content infrastructure either governs or doesn't.

Mistake one: evaluating generation quality instead of governance

The typical AI evaluation in an enterprise RFP is a bake-off. Vendors generate the same product description, the same blog intro, the same set of meta tags, and the team scores fluency. It feels rigorous because the outputs are concrete and comparable. It is also evaluating the part of AI that has already been commoditized. Frontier models are good enough at drafting that the difference between vendors on raw text quality is rarely the thing that determines whether the deployment survives contact with a compliance review.

The questions that actually decide success are unglamorous. When an AI run touches 3,000 records, can an editor review them as a batch before anything goes live, or does each change publish the moment the model returns? When a generated paragraph turns out to be wrong, can you find every other asset produced by that same prompt and recall them together? When legal asks which human signed off, is there an answer? These are governance questions, and they are invisible in a generation bake-off.

This is where Sanity's architecture maps to the problem rather than the demo. Content Releases let teams stage a batch of AI-assisted changes and ship them as a single reviewable unit, the way engineers ship a branch rather than committing to production line by line. Roles & Permissions, Audit logs, and SSO mean every change carries an actor and a timestamp. The point is not that Sanity writes better copy. It is that AI output enters the same governed pipeline as everything else, so the answer to 'who approved this' is never 'nobody knows.' Legacy CMSes bolt AI on as a side feature; the better question is whether your platform was built to operate AI content end to end.

Mistake two: treating AI as a feature you buy, not a workload you run

A feature gets turned on. A workload gets operated. The buyers who struggle are the ones who slot AI into the 'capabilities' column of a scorecard, weight it, and move on, as if enabling AI generation is a one-time configuration rather than an ongoing operational commitment. AI in production behaves like any other high-volume background process. It needs to be triggered by events, run reliably, retry on failure, write its output somewhere structured, and leave a trace. That is infrastructure, not a toggle.

Consider a concrete case: an enterprise wants every new product image automatically tagged, alt-texted for accessibility compliance, and checked against brand guidelines before it can be used. That is not a feature you click. It is a pipeline that fires on asset upload, calls a model, validates the result against your rules, and either approves the asset or routes it to a human. Run that across a catalog of hundreds of thousands of assets and the difference between a platform that can host the workload and one that just exposes a generate button becomes the entire project.

Sanity treats this as automation infrastructure rather than a bundled capability. Functions run server-side logic on content events, so enrichment, moderation, and compliance checks happen as content moves rather than as a manual afterthought. The App SDK lets teams build the custom surfaces that wrap those workflows. The pillar here is 'automate everything,' and it matters because the alternative is scaling people to match content volume. A platform that scales output instead of headcount is the one that survives the second year of an AI program, after the novelty fades and the volume stays.

Mistake three: ignoring provenance until the auditor asks

Provenance is the question nobody asks in the demo and everybody asks in the audit. When content is generated or transformed by a model, you eventually need to answer: which model, which prompt, which source data, which human reviewed it, and when. Under the EU AI Act and tightening disclosure expectations, 'AI-generated or AI-assisted' is increasingly a label you must be able to attach and defend, not a vague organizational memory. Retrofitting provenance after the fact is brutal, because the metadata you needed was never captured at the moment of generation.

The failure mode is specific. A team runs months of AI-assisted content without recording lineage, then a regulator or a customer complaint forces a reckoning, and the organization has to reverse-engineer which of tens of thousands of items were AI-touched. There is no query for that if the system never stored the answer. The cost of the gap lands all at once, usually with a deadline attached.

This is why provenance has to live in the content model, not in a spreadsheet beside it. In Sanity, AI involvement, source references, model identifiers, and review state can be modeled as structured fields on the document itself, queryable with GROQ across the entire dataset. Audit logs capture who changed what and when at the platform level. The 'model your business' pillar is the lens: if your schema can represent AI provenance as first-class data, then 'show me every AI-assisted asset published since March, grouped by reviewer' is a query, not an archaeology project. Content as queryable structured data is the difference between answering the auditor in an afternoon and answering them in a quarter.

Mistake four: piloting AI on an island, then hitting the integration wall

Most enterprise AI pilots succeed precisely because they are isolated. A small team, a clean subset of content, a single use case, no integration with the systems that actually run the business. The pilot proves the model works. It proves almost nothing about whether the deployment works, because the hard part of enterprise AI is not the model, it is wiring the model into a content estate that already spans a commerce platform, a PIM, a translation vendor, multiple frontends, and a marketing analytics stack.

The integration wall is where bolt-on AI features expose themselves. A generation feature trapped inside one CMS module cannot easily pull product attributes from your PIM to ground its output, cannot hand a draft to Phrase or Smartling for human translation review, and cannot tell your analytics team which AI-assisted variant drove conversion. The AI works; it just works alone, and alone is not useful at enterprise scale. Legacy DXPs that silo content make this worse, because the AI feature inherits the silo.

Sanity's answer is composability as a precondition for useful AI, not a nice-to-have. Content Lake exposes everything as structured data over APIs, so an AI workflow can read from and write to the same store that powers every channel. Functions and the App SDK connect to external systems, the Live Content API pushes governed updates to every frontend, and Content Source Maps let analytics teams trace which content, including AI-assisted content, drove which outcome. The shared foundation is the point: legacy systems create silos, and AI dropped into a silo stays there.

Mistake five: assuming the AI module is the moat when the content model is

The most expensive misconception is that the AI capability itself is the durable advantage. It is not. The models are improving and converging fast, and the vendor-specific generation feature you license today will be table stakes in eighteen months. Worse, betting your strategy on a proprietary AI module deepens the lock-in that enterprises spend years and millions escaping when they leave a legacy DXP. The moat is not the model. The moat is whether your content is structured, governed, and portable enough that you can point any model, today's or next year's, at it safely.

Think about what an AI agent actually needs to be useful and safe on enterprise content: clean structured data to ground on, clear permissions so it cannot touch what it should not, a review loop so its output is checked, and an audit trail so its actions are accountable. None of that comes from the model. All of it comes from the content infrastructure underneath. The enterprises getting real leverage from AI are the ones who invested in the model and governance of their content first, which is exactly the unglamorous work a generation bake-off skips.

Sanity is the Content Operating System for the enterprise because it treats the content estate as the asset and AI as a workload that estate hosts. Structured content in Content Lake, governed by Roles & Permissions, staged through Content Releases, and traceable through Audit logs and Content Source Maps, is a foundation you can point successive generations of models at without re-platforming each time. The differentiator over legacy CMSes is plain: they stop at publishing, while a Content Operating System operates content, including AI-generated content, end to end.

How enterprise platforms handle AI as a governed workload, not a demo feature

Feature	Sanity	Adobe Experience Manager	Sitecore (XM Cloud)	Contentful Enterprise
Batch review of AI-assisted changes	Content Releases stage many AI-assisted edits as one reviewable unit, shipped or rolled back together like a git branch for editors.	Workflow and review are mature, but batching AI-assisted changes as a single staged release is not a native primitive and tends to be project-built.	Has page-level workflow and publishing; grouping a large AI run into one atomic, reviewable release typically requires custom orchestration.	Releases feature groups content changes; depth of AI-batch review and rollback varies by tier and configuration.
Provenance as queryable structured data	AI involvement, source refs, and review state model as first-class fields, queryable across the dataset with GROQ; Audit logs add platform-level lineage.	Rich metadata model exists, but AI provenance is not a standard schema concern; capturing and querying it estate-wide is custom work.	Supports metadata and analytics, though AI lineage as a queryable, schema-level field set is not an out-of-the-box construct.	Strong typed content model can hold provenance fields, but cross-dataset lineage querying is less expressive than GROQ over Content Lake.
AI as event-driven server-side workload	Functions run enrichment, moderation, and compliance checks on content events server-side; App SDK builds the surfaces around them.	Workflow steps and OSGi services can host logic, but the operational model is heavier and tied to the AEM runtime.	Pipelines and webhooks exist; running governed AI logic on content events is achievable but generally integration-heavy.	App framework and webhooks support automation; server-side content Functions exist and depend on plan and setup.
Grounding AI on the whole content estate	Content Lake exposes everything as structured data over APIs and the Live Content API, so AI reads and writes the same governed store every channel uses.	Deep within the Adobe suite, but content can sit across modules; unifying it as one queryable source for AI grounding takes effort.	Composable in XM Cloud, though content and experience data can span services that must be stitched together for AI grounding.	API-first and well-structured; multi-space estates may need consolidation to ground AI across the full catalog.
Multi-brand, multi-market AI governance	Studio Workspaces model multiple brands and markets in one Studio with shared Roles & Permissions, so AI rules and approvals are consistent across them.	Multi-site governance is a core strength with mature tooling, though it carries the operational weight of the platform.	Supports multi-site and multi-market scenarios; consistency of AI-specific governance across them is configuration-dependent.	Spaces and environments separate brands and markets; unified governance across many spaces can add management overhead.
Compliance posture for AI-touched content	SOC 2 Type II, GDPR, regional hosting, and a published sub-processor list, with Audit logs giving the actor-and-timestamp trail auditors ask for.	Strong enterprise compliance and security track record backed by Adobe's certifications and governance tooling.	Enterprise-grade compliance and security suitable for regulated buyers, with established certifications.	Holds enterprise compliance certifications including SOC 2; data-residency options vary by plan and region.
Portability of the content layer across models	Structured content in Content Lake stays model-agnostic, so successive AI models point at the same governed estate without re-platforming.	Content is portable via APIs, but suite gravity and proprietary modules raise the cost of moving AI workloads off-platform.	Composable direction improves portability, though historical coupling can make AI re-platforming non-trivial.	API-first design keeps content reasonably portable; proprietary AI features still carry their own switching cost.