schema.org JSON-LD for AI assistants in 2026
Last updated: May 25, 2026
TL;DR
AI assistants turn prose into citable facts through structure. The five JSON-LD types that matter in 2026 are Organization + WebSite (sitewide identity), Article/BlogPosting (content), FAQPage (Q&A), Product/Offer (commerce), and Speakable (the snippets you want quoted). Serve them in a single inline <script type="application/ld+json"> per page, keep them honest, and validate.
Why structure beats prose for AI assistants
A language model can read prose, but it cites with confidence only when it can attribute a specific fact to a specific entity. JSON-LD is the cheapest way to give it that attribution: a single inline script tells the assistant who you are, what this page is, who wrote it, when, and which sentence is the answer. Pages without structured data still get crawled, but they end up summarised generically — “a site about X” — instead of quoted by name. The same discipline also unlocks Google's rich results, AI Overviews, and Bing/Copilot answer cards, so the ROI compounds.
The five JSON-LD types that actually move the needle
You don't need the full 800-type schema.org tree. In 2026, these five cover 95% of what AI assistants and search rich results actually consume:
1. Organization + WebSite (sitewide)
These belong in your global layout, served on every page. Organization tells assistants who you are (legal name, logo, social profiles via sameAs), WebSite tells them this is the canonical site for that entity and exposes a site search. Together they fix the single most common failure mode: the model treating your brand as ambiguous because nothing on the page declares an entity. Include a sameAs array pointing at your verified profiles (LinkedIn, GitHub, Crunchbase, Wikidata if you have it) — that is the cheapest entity-disambiguation signal you can ship.
2. Article / BlogPosting (per content page)
For anything article-shaped — blog posts, guides, news — add Article or BlogPosting with headline, datePublished, dateModified, author (preferably a Person with a url), publisher (your Organization), and image. Freshness matters more for AI citations than for classic SEO: assistants visibly prefer recently dated pages on time-sensitive topics, and an honest dateModified keeps you in that pool. Pair with a BreadcrumbListso the page's position in your site hierarchy is unambiguous.
3. FAQPage (Q&A blocks)
If your page contains genuine question-and-answer pairs, mark them up with FAQPage → mainEntity → Question / acceptedAnswer. Two rules: only mark up Q&A that is actually visible to the user (Google penalises hidden-FAQ schema, and assistants treat the visible text as authoritative anyway), and write the answers as complete sentences — “Yes” is useless when lifted into a chat reply without the question.
4. Product / Offer (commerce)
For ecommerce, Product with name, image, description, brand, sku, and a nested offers object (price, priceCurrency, availability) is the minimum that gets you into AI shopping surfaces. Add aggregateRating only if it is honest and dynamic — fake or stale ratings are the fastest way to get downranked across both Google and Perplexity. We dig into the commerce angle on the AI Shopping page.
5. Speakable (the snippets you want quoted)
Often overlooked: speakable (a property on Article / WebPage) lets you nominate the sections you want assistants and voice surfaces to lift verbatim, by CSS selector. Point it at your TL;DR block and your most concrete H2 sections. It is a hint, not a guarantee — but on a page with thousands of words, hinting beats hoping.
Where to put it and how to serve it
Inline <script type="application/ld+json"> in the document <head> (or near the start of <body>) is the only format you should ship. Microdata and RDFa still parse, but assistants and Google's rich-result tester both prefer JSON-LD, and one inline blob is easier to diff in code review. Critically, the script must be in the server-renderedHTML — if a framework injects it after hydration, AI crawlers (which mostly don't run JavaScript) will never see it. We unpacked the JS-rendering trap in why AI assistants ignore your site.
Common mistakes our auditor flags
- JSON-LD only in the client bundle. Renders fine in DevTools, invisible to GPTBot. Always server-render.
- Dishonest
dateModified. Touching the date without changing the content is a known anti-pattern; assistants and Google both notice over time. - FAQ schema on hidden content.Mark up only Q&A that is visible on page render.
- Multiple conflicting
Organizationblocks. One canonical Organization sitewide; per-page blocks should reference it via@id, not redeclare a different name. - Missing
sameAs. Without it, the model has no signal that your brand is the same entity as the one with the LinkedIn page it already trusts. - Invalid JSON. A single trailing comma silently breaks the whole block. Validate every change.
Verify it
Three free tools: Google's Rich Results Test (catches Google-specific gotchas), the schema.org validator (catches spec violations), and our own per-agent audit. The audit is the one most people miss — it fetches the page as GPTBot, ClaudeBot, PerplexityBot and Google-Extended, then checks whether the JSON-LD actually arrived in each bot's view (so you also catch CDN-level blocks, which we covered in robots.txt for AI crawlers). You can run a free audit in about 30 seconds.
FAQ
Do I need every schema.org type, or just these five?
Just these five for most sites. Recipe, Event, HowToand a handful of others are valuable for specific niches, but the five above cover identity, content, Q&A, commerce and citable snippets — the categories AI assistants actually use.
JSON-LD vs Microdata vs RDFa — which one?
JSON-LD. It is what Google recommends, what the rich-results tester prefers, and what most AI crawler pipelines normalise to internally. Microdata and RDFa still work but offer no advantage and are harder to maintain.
Will bad schema get me penalised?
Dishonest schema (hidden FAQ markup, fake ratings, mismatched dates) can absolutely cost you rich results on Google and trust on the AI side. Honest, minimal schema is strictly upside; over-claiming is the only failure mode.
Check your structured data the way an AI crawler sees it: run a free audit, or pair this with the llms.txt guide for the editorial layer on top.