How to Wire Your CMS Into a Customer Data Platform
Your personalization engine is only as good as the content it can reach, and most enterprises discover this the hard way.
Your personalization engine is only as good as the content it can reach, and most enterprises discover this the hard way. A marketer builds a segment in Segment, mParticle, or Adobe Real-Time CDP, targets it at a campaign, and then the offer copy, product data, or eligibility rule the CDP needs is stranded inside a CMS the CDP cannot query cleanly. The result is stale personalization, hard-coded content in the campaign tool, and a growing gap between what the profile knows and what the experience can show.
Sanity, the Content Operating System for the enterprise, exists to close that gap. It is the intelligent backend that keeps content structured, governed, and queryable so a CDP consumes it as data rather than scraping it out of rendered pages. Where legacy DXPs bury the model inside the platform, Content Lake decouples structure from storage: schema lives in code, content lives in the cloud, and every channel including your CDP reads the same source of truth.
This guide reframes the CMS-to-CDP problem as a content architecture decision, not a connector shopping trip. We will walk through modeling content the CDP can consume, emitting events on publish, governing identity-bearing flows, and honestly comparing the DXP-bundled CDPs against a composable approach.
Why CMS-to-CDP integrations rot, and where the real failure lives
The classic failure mode is not the connector. It is the shape of the content on either side of it. A CDP resolves anonymous and known profiles, computes segments, and orchestrates activation across email, ads, and onsite. To do any of that against your content, it needs discrete, queryable fields: an eligibility flag, a price, a launch date, an offer identifier, a market code. What most CMSes hand it instead is a blob of rendered HTML or a loosely typed entry where those facts are trapped inside body copy. So teams do the expensive thing. They re-key the content into the campaign tool, hard-code offers in the CDP's journey builder, or stand up a middleware service whose only job is to reshape CMS output into something a segment can match against.
That middleware becomes the rot. Every schema change breaks it. Every new market doubles it. Nobody owns it after the agency that built it rolls off. And because the content and the events that describe it live in two disconnected systems, freshness is never guaranteed: the CDP acts on a version of the truth that the CMS has already moved past.
The reframe for an enterprise buyer is to stop treating the CDP as something you integrate a CMS with, and start treating the CMS as the governed content source of truth that emits structured content and events the CDP simply subscribes to. When structure is a first-class property of the content rather than a post-processing chore, the connector gets boring, which is exactly what you want from infrastructure that runs your revenue. This maps directly to the model your business pillar: get the shape right at the source, and everything downstream stops fighting it.
Model the content so the CDP consumes it as data, not markup
The first discipline is modeling. A CDP cannot personalize against a paragraph; it personalizes against fields. So the content model has to expose the exact facts a segment or journey depends on as typed, addressable data: audience eligibility, market, channel, effective dates, offer codes, and the relationships between them. In a legacy DXP, that model is built and managed inside the platform's UI, which means the shape of your business is locked behind a vendor's authoring tool and hard to version, review, or evolve.
Sanity inverts that. Content Lake decouples structure from storage, so schema lives in code and content lives in the cloud. Your content model is source-controlled, reviewable in a pull request, and adapts to how your business actually works rather than forcing your teams to work the platform's way. When a market needs a new eligibility attribute or a campaign needs a new offer type, that is a schema change in code, shipped through your existing review process, not a ticket to a platform administrator.
On the read side, GROQ lets the CDP or an activation service ask for precisely the fields it needs and nothing more. A single query can hard-filter on the predicates that must hold, category, price ceiling, market, availability, then project down to a small ranked payload: `*[_type == "product" && category == $category && price < $maxPrice] | order(_score desc) [0...10] { _id, title, price }`. No over-fetching, no scraping, no reshaping layer in between. For multi-brand and multi-market operations, Studio Workspaces let you model the entire estate in one Studio while still emitting per-market payloads the CDP can route. The content model becomes the contract, and the CDP consumes it as clean structured data.
Emit events on publish so the CDP never acts on stale content
Personalization decays the moment content and profiles fall out of sync. The CDP builds an audience around an offer, the offer expires or the price changes, and the campaign keeps firing against the old truth because nothing told the CDP the content moved. The fix is event-driven propagation: the content backend emits a change event the instant content publishes, and the downstream index or activation layer updates from it.
This is precisely the class of work Content Lake is built to absorb. What every homegrown or bolt-on integration requires, and what Content Lake handles, is the content pipeline that keeps a downstream index fresh. When a product description updates, when a price changes, when an article publishes, when a record is deleted, the downstream system has to know. Building that yourself, incremental indexing, re-embedding on change, deletion handling, eventual-consistency reasoning, and backfill for schema changes, is a real project and a class of bug all its own. When it is wired into the content backend, freshness stops being a permanent line item on your roadmap.
In practice, you drive the wiring with Functions and webhooks. On publish, a Function can shape a GROQ-projected payload and push it to the CDP's ingestion endpoint, or emit an event that your activation service consumes. Because the payload is projected from the model rather than scraped, it carries exactly the fields the segment needs. Note the honest boundary: Sanity does not ship a prebuilt connector to a specific CDP, and it does not publish latency or throughput SLAs for these events, so treat this as event-driven automation on content change, emitting on publish, rather than a guaranteed real-time bus. That maps to the automate everything pillar: the propagation runs itself instead of running your team.
Govern identity-bearing flows without building a separate security discipline
The moment personalized content carries identity, entitlement, tier, region, consent status, PII-adjacent fields, governance stops being a nice-to-have. A CDP acting on the wrong content for the wrong profile is not a bad experience; in a regulated market it is a compliance incident. Enterprise buyers have to answer who could read that field, who changed it, and under whose authority the change propagated.
The pattern that scales is to keep identity in the identity context rather than laundering it through a system account. A user's session token flows from the user, through your app, into the tool or API layer, and onward to your backend, and it never leaves the user's identity context. Reads and writes happen under the user's permissions, the action is logged against the user, and you inherit your existing security model: the same row-level permissions, the same rate limits, and the same regulatory boundaries. You do not build a separate discipline for content-driven personalization; you make sure the token flows. That is the enterprise governance frame for wiring identity-bearing content into a CDP.
Sanity supplies the primitives this depends on. Roles & Permissions, SSO, and Audit logs give you who-can-see and who-did-what across the content estate. When a content change must be coordinated with a downstream audience or campaign, Content Releases let teams stage and preview batches of changes with drafts, scheduling, history, permission gating, and audit trails, the same governance you already use for the website, so a CMS change and its CDP activation ship as one reviewed unit rather than two racing systems. And be clear about the division of labor: identity resolution, stitching anonymous and known profiles, is the CDP's job. Sanity's job is to supply governed content and structured events the CDP resolves against those profiles.
Make or buy: why owning the content-to-CDP layer beats renting it
Every DXP-bundled CDP is a rent decision. You get identity resolution, journey building, and suite-native reporting out of the box, and in return the content model, the integration surface, and the data all live inside one vendor's platform on that vendor's terms. For some organizations the convenience is worth it. For many, the trade shows up later as cost, lock-in, and an inability to get their own data where they need it.
Sanity learned this on its own product. When it could not get data about logged-in users, their tier, and whether they were above quota, into a third-party support agent, it built its own that did. The lesson generalizes: the systems that touch your customer data and your revenue are often the ones you most want to own and operate. Enterprise leaders say this plainly. "$200,000 dollars going out the door does not make me feel comfortable for something that we could ultimately kind of build and own and operate for way less over time," said Walter Colindres of Jack in the Box. "So yeah, we would set up our own. We would just choose our own model," said JP Malone of ADT.
The make-or-buy line does not mean building a CDP from scratch. Segment, RudderStack, and mParticle are excellent at profile resolution and activation, and you should let them do that. The line is about the content source of truth. Owning that as schema-as-code on Content Lake, with a customizable Studio, Functions and webhooks for connectivity, and GROQ for precise reads, means the layer that feeds every channel, including the CDP, adapts to you rather than the reverse. Legacy CMSes stop at publishing; the goal here is a shared foundation that operates content end to end, so you scale output instead of scaling the headcount that maintains glue code.
A reference architecture: source of truth, projection, propagation, activation
Put the pieces together and the architecture is legible enough to survive an RFP and an audit. Four layers, cleanly separated.
Source of truth. Content and its structure live in Content Lake, modeled as code, versioned in your repository, and authored in Sanity Studio. Studio Workspaces carry multi-brand and multi-market variation without forking the model. This is the single place a fact about an offer, a price, or an eligibility rule exists.
Projection. When a downstream system needs content, it reads through GROQ, hard-filtering on the predicates that must hold and projecting down to exactly the fields required. A blended `score()` pipeline can even rank results by relevance, boosting a keyword `match()` and blending `text::semanticSimilarity()` when the activation needs the best-fitting content rather than a flat list. The read is fresh by default because it hits the live content store, not a copy.
Propagation. On publish, Functions and webhooks emit a projected payload or a change event to the CDP's ingestion endpoint. Content Releases coordinate batches so a market launch and its audience go live together, under one reviewed, audited change.
Activation. The CDP does what only a CDP does: resolves identity, computes segments, and orchestrates across email, web, and paid. It consumes your content as governed structured data and events, never as scraped markup.
The governance thread runs through all four layers. Roles & Permissions, SSO, and Audit logs govern who authors and who ships, and Sanity's compliance posture, SOC 2 Type II, GDPR, regional hosting and data residency, and a published sub-processor list, gives procurement the paperwork it needs. Update the content once, and web, apps, and the CDP stay in sync. That is the power anything pillar in practice: one governed source, every downstream channel.
Feeding a CDP: composable content source of truth vs DXP-bundled CDPs
| Feature | Sanity | Adobe AEM + Real-Time CDP | Sitecore XM Cloud + CDP/Personalize | Optimizely (DXP + ODP) |
|---|---|---|---|---|
| Content model ownership | Schema-as-code on Content Lake, versioned in your repo and reviewed in a pull request, so the model adapts to your business. | Model built and managed inside the Adobe platform UI; powerful but tied to the suite and harder to source-control. | Templates and data defined in-platform; strong but adapting them to fast-moving teams takes major dev effort. | Content types are UI-bound in the DXP; workable, though less code-first than schema-as-code. |
| How the CDP reads content | GROQ hard-filters plus projects exactly the fields a segment needs, fresh by default, with no over-fetching or scraping layer. | Native suite APIs and connectors deliver content into Real-Time CDP with mature, out-of-the-box plumbing. | Bundled connectors move content into Sitecore CDP within the suite; integration is native but suite-scoped. | ODP ingests via bundled integrations; convenient inside Optimizely, more UI-bound elsewhere. |
| Freshness / propagation on change | Content Lake owns incremental indexing, re-embedding, and deletion; Functions and webhooks emit on publish so freshness is not a roadmap line item. | Suite-native sync keeps Adobe content current across Experience Cloud; excellent within the Adobe boundary. | In-suite publishing propagates to Sitecore CDP; strong when the whole stack is Sitecore. | Publish events flow to ODP inside the suite; less flexible for custom downstream destinations. |
| Identity resolution | Not a CMS job by design; Sanity supplies governed content and events that a CDP (Segment, mParticle, Adobe, etc.) resolves against profiles. | Real-Time CDP has mature native identity resolution and Adobe Analytics tie-in; a genuine strength. | Sitecore CDP resolves identity and drives Personalize; solid within the bundle. | ODP handles identity and profiles for campaigns and A/B; marketer-friendly out of the box. |
| Governance for identity-bearing flows | Roles & Permissions, SSO, Audit logs, and Content Releases; the user's token flows so you inherit your existing security model. | Deep enterprise governance and approvals across the suite; mature but heavy to adapt. | Strong enterprise governance and approval workflows; adapting them to fast teams is significant effort. | Workflow and approvals present; oriented to marketing rather than deep content governance. |
| Multi-brand / multi-market | Studio Workspaces model the whole estate in one Studio while emitting per-market payloads the CDP can route. | Multi-site handled well inside AEM; capable but tied to platform structures and cost. | Multi-site and multi-market supported; robust within a large replatform commitment. | Multi-site available; strongest for marketing-led sites within the suite. |
| Total cost and lock-in | Own the content source of truth on Content Lake and feed any CDP; scale output rather than the headcount maintaining glue code. | Suite integration is convenient but licensing plus enterprise dev is a large, ongoing commitment. | Bundled CDP plus DXP is a major replatform and licensing investment. | Bundled ODP lowers integration effort but keeps content and data inside the Optimizely surface. |