We Crawled 2,174 Business Pages. Only 6% Had Structured Data.
We pulled the numbers from our own platform. 132 of 2,174 crawled business pages had JSON-LD. Here’s what that means for getting cited by AI.
Dan Johnson
Co-Founder · June 4, 2026 · 5 min read

Here are the actual numbers from our platform, pulled today. Full sample sizes shown so you can decide for yourself whether the conclusions hold.
Drilling in:
- 2,174 pages crawled across our customer base
- 132 of those pages had JSON-LD structured data (6.1%)
- 2,042 did not (93.9%)
- 7 brands have been through our AI audit agent's full pipeline so far
- Every one of those 7 brands had at least one structured data gap flagged in its top recommendations
- 125 total recommendations generated by the agent across those 7 brands
- 22 of the 125 recs were structured data fixes (17.6%, the second most common type behind content gaps)
That's a small brand sample. It's a meaningful page sample. We'll come back and update this post as the dataset grows.
What it suggests today: structured data is the single most common technical gap we see, and the deficit on the open web is larger than most operators realize.
What "structured data" actually is
Structured data is a small block of JSON in the <head> of your page that tells machines, including AI assistants, what your page is about, in a vocabulary they already understand.
The vocabulary is schema.org[1], an open standard that Google, Microsoft, Yahoo, and Yandex co-founded in 2011. The format is JSON-LD. Google's own structured data documentation[2] recommends it as the preferred markup format.
A page selling a Class V hitch can either describe the product in human-readable HTML and hope the parser figures it out, or it can ship this:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Class V Receiver Hitch",
"brand": { "@type": "Brand", "name": "Acme Towing" },
"offers": {
"@type": "Offer",
"price": "249.00",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
}
}
</script>The first version asks the AI engine to guess. The second hands it the answer.
Why this matters more for AI than it did for SEO
When Google introduced JSON-LD a decade ago, the payoff was rich results: star ratings in the SERP, FAQ accordions, event cards. Useful, but not load-bearing.
For AI engines, it's different. ChatGPT, Perplexity, and Google's AI Overviews don't have a SERP to decorate. They have to decide, in real time, which businesses to cite by name when someone asks "best HVAC company in Austin" or "compare Stripe vs Adyen for B2B."
That decision runs on machine-readable signals. JSON-LD is the cleanest available. A page that declares @type: LocalBusiness with a name, address, phone, and service area is doing the AI engine's job for it. A page that buries the same information in marketing prose is asking the model to infer, and inference fails quietly.
This isn't only our take. Google's structured data guide[2] is explicit that schema markup helps search engines understand the content of the page and create rich results. Google's AI Overviews ride on the same parsing pipeline that produces those rich results. The schema.org spec predates LLMs by a decade, but the spec is exactly what an LLM wants to read.
The 4 schema types our agent recommends most often
Across the 22 structured-data recommendations the audit agent has generated so far, four schema types dominate. Numbers below are out of those 22 recs.
| Schema type | Recommended in | What it does |
|---|---|---|
| Organization | 16 of 22 (73%) | Declares your company exists, with name, logo, social links, contact. AI engines use it to disambiguate you from same-named entities. |
| Product | 9 of 22 (41%) | One per product or service line. Without it, you're invisible to comparison queries. |
| FAQPage | 6 of 22 (27%) | Lets AI engines lift your FAQ answers verbatim into responses. Direct citation pathway. |
| BreadcrumbList | 5 of 22 (23%) | Tells parsers your site hierarchy. Underrated, cheap, helps internal page authority distribution. |
If you ship nothing else, ship Organization on your homepage. It's the foundation that every other schema type implicitly references.
What an actual audit recommendation looks like
The audit agent isn't generic. Here's a real recommendation it generated for a brand whose primary product is in-person events. Names and URLs anonymized:
Signal: the Event schema type is entirely absent across all 22 crawled pages. The company's primary commercial product is in-person events, yet no event page carries Event structured data. Google's event rich results program explicitly uses Event JSON-LD to surface event dates, venues, and ticket links directly in search.Action: for each event page, inject a JSON-LD block in the page<head>with@type: Event,name,startDateandendDatein ISO 8601 format,locationas a nestedPlaceobject containingname(venue name) andaddress(street, city, state, postal code), andorganizer.
That level of specificity is what makes a recommendation actionable. "Add some schema" doesn't move anything. "Add an Event block with these six fields to every page under /events/" is a Tuesday-afternoon engineering ticket.
The 4-line minimum every site should ship today
If you read this far and want to act, here's the smallest useful step. Drop this into your homepage <head> and replace the four placeholders with your real values.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Company Name",
"url": "https://yourcompany.com",
"logo": "https://yourcompany.com/logo.png",
"sameAs": [
"https://twitter.com/yourcompany",
"https://linkedin.com/company/yourcompany"
]
}
</script>Then run Google's Rich Results Test[3] against your URL. If it passes, you've already crossed out of the 93.9% bucket.
What to measure
Three signals tell you it's working:
- Rich Results Test passes with no warnings. Same-day.
- Pages with structured data climbs in your Google Search Console "Enhancements" tab. Two to four weeks.
- AI assistants name your brand when asked about your category. The slow signal, and the one that pays. Three to six months, depending on how often models retrain on your content.
If you want a free read on which schemas are missing from your site, run a scan. The audit agent generates the same structured-data recommendations referenced above, against your actual pages, in about three minutes.
References
- [1]
- [2]Introduction to structured data markup in Google Search — Google Search Centralhttps://developers.google.com/search/docs/appearance/structured-data/intro-structured-dataAccessed Jun 4, 2026
- [3]Rich Results Test — Google Search Centralhttps://search.google.com/test/rich-resultsAccessed Jun 4, 2026
About Dan Johnson
Dan Johnson is the co-founder of GeoReputation, where he handles the engineering. Posts on this blog are usually grounded in data pulled live from the platform.