Why schema matters more for AI search than for Google
Google can extract product information from your unstructured HTML if it has to. Crawlers have been doing this for two decades, and the algorithms are good at it. AI search engines work differently. When ChatGPT or Perplexity answers a shopper query like "best running shoes for flat feet under $150", the model is synthesizing an answer in real time from retrieved content. It needs to find clear, machine-extractable facts — brand name, price, attribute, review score — and attribute them to a source. Unstructured HTML is noisy. Schema.org structured data is the signal that cuts through.
The practical implication: a Shopify store with comprehensive Schema.org markup is materially more likely to be cited by AI engines than an identical store without it. We've seen the gap show up in our own audits and in the citation patterns Perplexity exposes through its citations array. Structured data is the cheapest, highest-leverage lever available to a new e-commerce brand trying to win AI visibility.
This guide covers what to ship and where. It assumes you have access to your Shopify theme code (Online Store → Themes → Edit code). If you're on a no-code stack and can only add custom HTML through the theme editor, you can still ship most of these schemas — they just go into a single custom block instead of distributed across templates.
The 7-schema stack
The minimum viable Schema.org stack for a Shopify store optimized for AI search. Each one answers a different question an AI engine asks while building a response.
1. Organization + WebSite (theme.liquid)
These two go in your layout/theme.liquid so they render on every page. They're the brand's identity graph — every other schema on the site can reference them via @id, which is how AI engines build a coherent picture of the brand across pages.
Open layout/theme.liquid, scroll to the <head> section, and add the following before the closing </head> tag:
layout/theme.liquid <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Organization", "@id": "{{ shop.url }}/#organization", "name": "{{ shop.name | escape }}", "url": "{{ shop.url }}", "logo": "{{ shop.brand.logo | image_url: width: 600 }}", "description": "{{ shop.description | escape }}", "sameAs": [ // Add real social URLs only — empty array is safer than wrong URLs ] } </script> <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "WebSite", "@id": "{{ shop.url }}/#website", "url": "{{ shop.url }}", "name": "{{ shop.name | escape }}", "publisher": { "@id": "{{ shop.url }}/#organization" }, "inLanguage": "{{ request.locale.iso_code }}", "potentialAction": { "@type": "SearchAction", "target": "{{ shop.url }}/search?q={search_term_string}", "query-input": "required name=search_term_string" } } </script>
Two things to notice. First, the @id values use full canonical URLs. This is intentional — AI engines that build entity graphs use these as stable identifiers. If you change them later, you'll fragment your entity in the model's representation. Lock them in once and don't churn.
Second, the sameAs array is empty. Resist the temptation to fill it with placeholder URLs. An empty array tells AI engines "we have no verified social presence yet" — which is true and fine. A sameAs array pointing to a non-existent or wrong social profile actively damages the entity graph and can take months to clear out of a model's representation.
2. Product schema (product.liquid)
This is where most of the AI citation value lives, and it's where Shopify's default markup falls short. Replace or augment your existing product.liquid Schema.org block with this fuller version. Place it inside sections/main-product.liquid (or whatever your theme calls the product section) at the bottom, before the closing block.
sections/main-product.liquid {%- assign current_variant = product.selected_or_first_available_variant -%} <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Product", "@id": "{{ shop.url }}{{ product.url }}#product", "name": "{{ product.title | escape }}", "description": "{{ product.description | strip_html | strip_newlines | escape }}", "image": [ {%- for image in product.images limit: 6 -%} "https:{{ image | image_url: width: 1200 }}"{%- unless forloop.last -%},{%- endunless -%} {%- endfor -%} ], "sku": "{{ current_variant.sku }}", "brand": { "@type": "Brand", "name": "{{ product.vendor | default: shop.name | escape }}" }, "category": "{{ product.type | escape }}", {%- if product.metafields.custom.material -%} "material": "{{ product.metafields.custom.material | escape }}", {%- endif -%} "offers": { "@type": "Offer", "url": "{{ shop.url }}{{ product.url }}", "priceCurrency": "{{ shop.currency }}", "price": "{{ current_variant.price | money_without_currency | replace: ',', '' }}", "availability": "{% if current_variant.available %}https://schema.org/InStock{% else %}https://schema.org/OutOfStock{% endif %}", "seller": { "@id": "{{ shop.url }}/#organization" } } {%- if product.metafields.reviews.rating_value -%} ,"aggregateRating": { "@type": "AggregateRating", "ratingValue": "{{ product.metafields.reviews.rating_value }}", "reviewCount": "{{ product.metafields.reviews.review_count }}" } {%- endif -%} } </script>
A few non-obvious decisions in this schema worth understanding:
- The
@idappends#product. Schema.org allows multiple typed entities on a single URL, and the fragment identifier disambiguates them. This matters because the same page hosts BreadcrumbList and Review entities too — each needs a unique@id. - Image is an array, not a single URL. AI engines that retrieve product information often want to see multiple angles. Up to six images is the sweet spot — beyond that, returns drop and you bloat the page.
- The
brand.namefalls back toshop.name. Most Shopify stores don't fill inproduct.vendoron their own-brand products (vendor is typically set when reselling). The fallback ensures the brand is always declared. - Price is stripped of currency symbol and commas. Schema.org expects
priceas a decimal number string.$1,499.00fails validation;1499.00passes. - The aggregateRating block is conditional. Don't ship empty rating fields with zeros — AI engines and Google's Rich Results Test will flag this as deceptive. The
{%- if ... -%}guard means rating only ships when there are real reviews.
3. Review schema (loop inside product.liquid)
If you have product reviews — Shopify's free Product Reviews app, Judge.me, Stamped, Yotpo, or any other — emit a Review schema for each one. This is the schema AI engines weight most heavily for product recommendations, because reviews are first-person evaluative content.
sections/main-product.liquid (inside reviews loop) {%- for review in product.metafields.reviews.list.value limit: 12 -%} <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Review", "itemReviewed": { "@id": "{{ shop.url }}{{ product.url }}#product" }, "author": { "@type": "Person", "name": "{{ review.author | escape }}" }, "datePublished": "{{ review.date | date: '%Y-%m-%d' }}", "reviewRating": { "@type": "Rating", "ratingValue": "{{ review.rating }}", "bestRating": "5" }, "reviewBody": "{{ review.body | strip_html | escape }}" } </script> {%- endfor -%}
The metafield path (product.metafields.reviews.list.value) is illustrative — your actual path depends on which review app you use. Judge.me exposes reviews differently from Stamped, and the Shopify Product Reviews app uses yet another schema. The shape of the loop is what matters: emit one Review entity per actual review, with the itemReviewed.@id pointing back to the Product entity.
Limit to 12 reviews maximum per page. AI engines don't gain meaningful extra signal beyond that, and the page bloat starts costing you Core Web Vitals.
4. FAQPage (per-page, not per-store)
FAQPage schema is the highest-yield schema you can ship for AI extraction. AI engines pull direct answer snippets from FAQPage disproportionately, because the question/answer format mirrors the conversational query format the AI itself is processing.
Don't ship FAQPage globally. Ship it on pages that have a visible FAQ section — your About page, your Shipping & Returns page, your top three product pages where buyers ask the same questions repeatedly. Each FAQ item in the schema must mirror an FAQ visible to the user on that page. Google flags FAQPage with hidden answers as misleading.
templates/page.faq.liquid (or any page with visible FAQ) <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "How long does shipping take?", "acceptedAnswer": { "@type": "Answer", "text": "Domestic US orders ship within 1-2 business days and arrive in 3-5 business days. International shipping takes 7-14 business days depending on destination." } }, { "@type": "Question", "name": "What's your return policy?", "acceptedAnswer": { "@type": "Answer", "text": "Unused items in original packaging can be returned within 30 days for a full refund. We cover return shipping for defective items." } } ] } </script>
For dynamic FAQ content backed by a Shopify metafield or page sections, iterate with {%- for question in page.metafields.custom.faqs.value -%} instead of hardcoding. Either way, the constraint stays: visible Q&A on the page must match the schema entries.
5. BreadcrumbList (collection.liquid, product.liquid)
BreadcrumbList tells AI engines how the current page fits into the catalog hierarchy. It's a cheap schema — under 20 lines — and it pays off in two ways. First, AI engines extract category context from it ("this product is in the Eyewear → Reading Glasses category"). Second, Google rich results use it to render breadcrumb navigation in search snippets, which improves click-through.
sections/main-product.liquid <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "{{ shop.url }}" }{%- if collection -%}, { "@type": "ListItem", "position": 2, "name": "{{ collection.title | escape }}", "item": "{{ shop.url }}{{ collection.url }}" }{%- endif -%}, { "@type": "ListItem", "position": {% if collection %}3{% else %}2{% endif %}, "name": "{{ product.title | escape }}", "item": "{{ shop.url }}{{ product.url }}" } ] } </script>
6. Article (article.liquid)
Blog posts are AI training fodder. Even if your traffic is dominated by product pages, the editorial content on your blog is what AI engines reference when buyers ask category questions ("what's the difference between X and Y?"). Article schema makes blog posts machine-extractable.
sections/main-article.liquid <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Article", "headline": "{{ article.title | escape }}", "description": "{{ article.excerpt_or_content | strip_html | truncate: 200 | escape }}", "image": "https:{{ article.image | image_url: width: 1200 }}", "datePublished": "{{ article.published_at | date: '%Y-%m-%dT%H:%M:%SZ' }}", "dateModified": "{{ article.updated_at | date: '%Y-%m-%dT%H:%M:%SZ' }}", "author": { "@type": "Person", "name": "{{ article.author | escape }}" }, "publisher": { "@id": "{{ shop.url }}/#organization" }, "mainEntityOfPage": { "@type": "WebPage", "@id": "{{ shop.url }}{{ article.url }}" } } </script>
Validation workflow
Ship to a staging theme first if you can. Then run each templated URL through three validators in this order:
- Google Rich Results Test — Catches the most issues fastest. Look for "Detected items" matching what you shipped (Organization, WebSite, Product, etc.). Zero errors required; warnings are usually fine.
- Schema.org Validator — Stricter than Google's tool. Will flag things like missing required properties that Google overlooks. Use this to catch latent issues before they bite.
- JSON-LD Playground — For when one of the first two flags a syntax error you can't see. Paste the raw JSON-LD into the playground; it'll point at the exact malformed character.
For ongoing monitoring, add the three template URLs (homepage, one product, one blog post) to Google Search Console's URL Inspection tool. Search Console will re-validate schema on every recrawl and notify you if something breaks.
Common mistakes (the top 5 we see)
1. Empty sameAs array filled with broken social URLs
The default Shopify themes that ship social-icon URL fields often inject those URLs straight into Organization schema's sameAs. If you haven't actually created the Twitter / Facebook / Instagram profile yet, you're shipping a schema entity that references nonexistent URLs. Either leave the array empty or include only profiles you've verified live.
2. Currency symbol or thousand-separator in price
"price": "$1,499.00" fails. "price": "1499.00" passes. Always strip with {{ price | money_without_currency | replace: ',', '' }}.
3. aggregateRating with zero reviews
Shipping {"ratingValue": 0, "reviewCount": 0} looks like data, but it's the worst kind — Google flags it as deceptive, and AI engines have learned to discount or distrust the entire Product entity when this pattern shows up. Use a conditional {%- if review_count > 0 -%} guard.
4. Multiple Product entities with the same @id
If you use a product page builder app that injects its own Product schema in addition to your theme's, you can end up with two Product entities at the same @id. AI engines pick one and discard the other arbitrarily. Audit by view-source on a product page and search for "@type": "Product" — there should be exactly one match.
5. FAQPage schema with questions that aren't visible on the page
Google explicitly forbids this and AI engines downweight pages that do it. The schema is supposed to be a machine-readable mirror of visible content, not a way to sneak extra questions in front of crawlers. If a Q&A pair is in the schema, the same Q&A must be in the rendered DOM (it can be inside an accordion, but it has to be in the HTML).
Where Schema.org ends and llms.txt begins
Schema.org is structured data for machines. llms.txt is unstructured prose written for machines. They complement each other and serve different extraction patterns.
AI engines that retrieve and parse the full page (Perplexity, Gemini with grounding, ChatGPT with browsing) use Schema.org. AI engines that pre-process site context before answering (Claude with its emerging llms.txt support, some smaller niche AI search tools) use llms.txt. A store with both is covered for both patterns.
We've covered llms.txt setup in our own manifest as a working example, and we'll publish a dedicated post on the file format and conventions soon.
What we're shipping next
This guide is the practical layer of GeoNexa's Case Study Zero — we ship the same schemas on our own site that we're describing here. Our 4/100 baseline AI search score and the public 90-day commitment to reach 60/100 live at Case Study Zero. If you're considering implementing this yourself and want a second pair of eyes on the result, GeoNexa's founding cohort still has spots open.
Want this shipped for your store?
Book a free 30-minute AI visibility audit. We'll run the same schema test on your store, show you what's missing, and quote a foundation build.
Book Free Audit →