Sanity Embeddings: Vector Search for CMS

Most enterprise CMS platforms treat AI as an add-on module. Sanity treats it as core infrastructure.
The difference matters for B2B SaaS companies evaluating headless CMS platforms. Gartner predicts that 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025. That trajectory requires content infrastructure capable of supporting semantic search, AI chatbots and personalization at scale.
Sanity now offers native dataset embeddings, built directly into its Content Lake and queryable through GROQ. For enterprise technology evaluators and website managers, this represents a significant architectural decision point: build vector search yourself with external infrastructure, or use Sanity's managed, fully integrated embeddings solution.
We design and develop high-performance websites on Sanity built for scale, speed, and collaboration.

How Sanity Embeddings Work
Vector embeddings convert text content into numerical representations that capture semantic meaning. Instead of matching exact keywords, vector search understands that "authentication process" and "login workflow" refer to the same concept.
Sanity's embeddings feature is enabled at the dataset level and operates natively within GROQ, requiring no external vector database or separate API. The process works as follows:
- Enable embeddings on a dataset. When creating a new dataset, pass the --embeddings flag via the CLI. Embeddings are available on all Sanity plans for new datasets; enabling embeddings on existing datasets is currently limited to Enterprise plans.
- Scope what gets embedded with a projection. By default, Sanity embeds the full document. For most production datasets, a targeted projection is recommended, scoping embeddings to only the fields users actually search against. This improves result relevance, speeds up initial generation and reduces recomputation overhead on document updates.
- Query with semantic similarity in GROQ. Once embeddings are ready, use the text::semanticSimilarity() function inside a score() expression to rank results by semantic relevance. Results include a _score field for ranking and an _embeddings field surfacing the specific text fragments that drove the match.
- Combine semantic and keyword search. Hybrid queries pair text::semanticSimilarity() with GROQ's match operator, so results can surface both conceptually related content and documents containing exact keywords. Documents matching on both score highest.
The embedding model is managed by Sanity and updated automatically; your dataset recomputes when the model changes. Embedding updates after document mutations are asynchronous and typically reflect within one minute.
Optimization tip: Scope your projection to fields with genuine search value. Avoid high-frequency fields that update often but carry no semantic signal, since each change triggers a recomputation cycle.
Seven Production-Ready Use Cases
Vector search capabilities map to specific B2B content challenges. Each use case addresses problems that traditional keyword search cannot solve.
Semantic Search for Documentation
Developers searching for "how to authenticate API requests" can successfully retrieve documentation using different terminology like "authorization headers" or "token-based access." Semantic search enables meaning-based content discovery beyond keyword matching, which is critical for enterprise documentation sites, knowledge bases and product information repositories where technical users need to find complex information without knowing exact terminology.
With Sanity's native embeddings, this capability lives entirely within your GROQ queries: filter to the relevant document types, apply text::semanticSimilarity() inside score(), and return ranked results without routing through an external pipeline.
AI Chatbot Integration with RAG
Enterprise sales and support chatbots built with Retrieval-Augmented Generation (RAG) can surface content based on customer role, industry vertical or account type while maintaining compliance requirements. Using Sanity as the source of truth for all chatbot content provides governance capabilities including version control, approval workflows, audit trails and content lifecycle management. These capabilities matter for B2B companies with regulatory obligations.
Automated Content Recommendations
Vector search enables related content suggestions by computing semantic similarity without manual tagging. Sanity's embeddings automatically cluster content by conceptual proximity, surfacing relevant materials without requiring editors to manually define and maintain relationships. For enterprise websites with extensive resource libraries, this eliminates significant content editor workload across hundreds or thousands of assets.
Personalized Content Delivery
Embeddings enable advanced personalization by matching user intent and context against content vectors in real time. This approach generalizes to new content and contexts automatically, unlike rule-based personalization that requires explicit logic for every scenario.
Account-based marketing (ABM) particularly benefits from this capability, where content delivery must adapt to company size, industry vertical, technology stack and buyer role without programming rules for every permutation.
Developer Portal Enhancement
Developer documentation requires specialized search that understands code concepts and technical terminology. Hybrid GROQ queries combining keyword matching with semantic similarity allow developers to find code examples and integration guides based on conceptual understanding rather than string matching.
Content Gap Analysis
Beyond retrieval, embeddings enable analytical use cases. Content teams can identify semantic gaps in coverage, detect duplicate content through clustering and analyze distribution across topic areas, providing objective data for content modeling audits and strategy refinement.
Multilingual Discovery
Modern embedding models map multiple languages into a shared semantic space, enabling queries in one language to return relevant content in other languages without translation at query time.
Implementation Considerations
Enterprise implementations require careful architectural planning regardless of platform choice.
Performance Requirements
Enterprise vector search implementations should target specific latency budgets to ensure responsive user experiences for content discovery and AI-powered retrieval systems. Key targets include search query response times under 100ms at p95 and end-to-end search experiences optimized for user responsiveness. Because Sanity's embeddings live natively in the Content Lake, there is no cross-service latency from routing queries to an external vector database.
Note that on datasets with embeddings enabled, write speeds may be slower depending on system load, and Sanity may apply rate limits to manage resource usage.
Security Requirements
Enterprise security best practices require AES-256 encryption at rest, TLS 1.3 in transit, integration with enterprise key management, role-based access control for search operations and row-level security filtering results based on user attributes. Compliance requirements include data residency for GDPR, audit logging and hard deletion capabilities under GDPR Article 17.
Hybrid Search Strategy
Production implementations should combine vector similarity with traditional keyword search. Pairing text::semanticSimilarity() with keyword match expressions in a single score() call provides optimal results, with documents matching on both scoring highest. Result fusion approaches such as weighted scoring (for example, 0.7 × vector score + 0.3 × keyword score) can be tuned to your specific retrieval requirements.
The Strategic Decision
Semantic search is becoming table stakes for enterprise CMS platforms. Sanity's native dataset embeddings provide a meaningful head start for teams prioritizing semantic search, RAG-powered chatbots and automated content recommendations, without the overhead of managing external vector databases or separate embedding pipelines.
For B2B SaaS companies evaluating headless CMS platforms, the decision framework maps to specific priorities:
- Semantic search priority: Sanity provides native, managed embeddings within GROQ, requiring no external infrastructure.
- AI content operations and governance: Contentful offers mature capabilities with composable flexibility.
- Self-hosting and data sovereignty: Strapi provides open-source infrastructure with custom implementation options.
- Full DXP with maximum governance: Contentstack delivers enterprise-grade content management.
Sanity's unified architecture treats AI as core infrastructure rather than optional add-ons, combining AI Assist (automated translation, SEO optimization and metadata generation) with native dataset embeddings (semantic search, hybrid retrieval and automated content recommendations). The right CMS choice depends on your specific technical requirements: whether native vector search matters, your preferred pricing model and your governance and compliance obligations.
What This Means for Your CMS Strategy
Enterprise content platforms face a clear inflection point. As AI capabilities become central to content discovery and personalization, the infrastructure decisions you make today determine your operational flexibility for years ahead. Sanity's native embeddings, queryable directly in GROQ without external dependencies, provide a more integrated and lower-overhead path to semantic search than building on separate vector infrastructure.
The competitive landscape will shift as other platforms add similar capabilities. The question is whether waiting makes sense for your roadmap. For B2B SaaS companies already planning AI-powered content experiences, building on infrastructure designed for these use cases reduces technical debt and time to value.
Your CMS architecture should enable marketing team autonomy while supporting the AI capabilities your product and growth teams need. That alignment requires intentional platform selection, not incremental bolt-ons.
Ready to evaluate headless CMS platforms for your AI-ready content infrastructure? Talk to Webstacks about implementing a composable architecture that supports semantic search, personalization and scalable content operations.
Your website is your biggest growth lever—are you getting the most out of it? Schedule a strategy call with Webstacks to uncover conversion roadblocks, explore high-impact improvements, and see how our team can help you accelerate growth.



