Blog·Technical

We Crawled 2,174 Business Pages. Only 6% Had Structured Data.

We pulled the numbers from our own platform. 132 of 2,174 crawled business pages had JSON-LD. Here’s what that means for getting cited by AI.

Dan Johnson

Co-Founder · June 4, 2026 · 5 min read

We Crawled 2,174 Business Pages. Only 6% Had Structured Data.

Here are the actual numbers from our platform, pulled today. Full sample sizes shown so you can decide for yourself whether the conclusions hold.

93.9%
of crawled business pages had no JSON-LD structured data
n = 2,174 pages crawled across our customer base

Drilling in:

  • 2,174 pages crawled across our customer base
  • 132 of those pages had JSON-LD structured data (6.1%)
  • 2,042 did not (93.9%)
  • 7 brands have been through our AI audit agent's full pipeline so far
  • Every one of those 7 brands had at least one structured data gap flagged in its top recommendations
  • 125 total recommendations generated by the agent across those 7 brands
  • 22 of the 125 recs were structured data fixes (17.6%, the second most common type behind content gaps)

That's a small brand sample. It's a meaningful page sample. We'll come back and update this post as the dataset grows.

What it suggests today: structured data is the single most common technical gap we see, and the deficit on the open web is larger than most operators realize.

What "structured data" actually is

Structured data is a small block of JSON in the <head> of your page that tells machines, including AI assistants, what your page is about, in a vocabulary they already understand.

The vocabulary is schema.org[1], an open standard that Google, Microsoft, Yahoo, and Yandex co-founded in 2011. The format is JSON-LD. Google's own structured data documentation[2] recommends it as the preferred markup format.

A page selling a Class V hitch can either describe the product in human-readable HTML and hope the parser figures it out, or it can ship this:

class-v-hitch product page
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Class V Receiver Hitch",
  "brand": { "@type": "Brand", "name": "Acme Towing" },
  "offers": {
    "@type": "Offer",
    "price": "249.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}
</script>
html

The first version asks the AI engine to guess. The second hands it the answer.

Why this matters more for AI than it did for SEO

When Google introduced JSON-LD a decade ago, the payoff was rich results: star ratings in the SERP, FAQ accordions, event cards. Useful, but not load-bearing.

For AI engines, it's different. ChatGPT, Perplexity, and Google's AI Overviews don't have a SERP to decorate. They have to decide, in real time, which businesses to cite by name when someone asks "best HVAC company in Austin" or "compare Stripe vs Adyen for B2B."

That decision runs on machine-readable signals. JSON-LD is the cleanest available. A page that declares @type: LocalBusiness with a name, address, phone, and service area is doing the AI engine's job for it. A page that buries the same information in marketing prose is asking the model to infer, and inference fails quietly.

This isn't only our take. Google's structured data guide[2] is explicit that schema markup helps search engines understand the content of the page and create rich results. Google's AI Overviews ride on the same parsing pipeline that produces those rich results. The schema.org spec predates LLMs by a decade, but the spec is exactly what an LLM wants to read.

The 4 schema types our agent recommends most often

Across the 22 structured-data recommendations the audit agent has generated so far, four schema types dominate. Numbers below are out of those 22 recs.

Most-recommended schema types across 22 audit-agent recommendations
Schema typeRecommended inWhat it does
Organization16 of 22 (73%)Declares your company exists, with name, logo, social links, contact. AI engines use it to disambiguate you from same-named entities.
Product9 of 22 (41%)One per product or service line. Without it, you're invisible to comparison queries.
FAQPage6 of 22 (27%)Lets AI engines lift your FAQ answers verbatim into responses. Direct citation pathway.
BreadcrumbList5 of 22 (23%)Tells parsers your site hierarchy. Underrated, cheap, helps internal page authority distribution.

If you ship nothing else, ship Organization on your homepage. It's the foundation that every other schema type implicitly references.

What an actual audit recommendation looks like

The audit agent isn't generic. Here's a real recommendation it generated for a brand whose primary product is in-person events. Names and URLs anonymized:

Signal: the Event schema type is entirely absent across all 22 crawled pages. The company's primary commercial product is in-person events, yet no event page carries Event structured data. Google's event rich results program explicitly uses Event JSON-LD to surface event dates, venues, and ticket links directly in search.
Action: for each event page, inject a JSON-LD block in the page <head> with @type: Event, name, startDate and endDate in ISO 8601 format, location as a nested Place object containing name (venue name) and address (street, city, state, postal code), and organizer.

That level of specificity is what makes a recommendation actionable. "Add some schema" doesn't move anything. "Add an Event block with these six fields to every page under /events/" is a Tuesday-afternoon engineering ticket.

The 4-line minimum every site should ship today

If you read this far and want to act, here's the smallest useful step. Drop this into your homepage <head> and replace the four placeholders with your real values.

index.html — Organization schema minimum
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Your Company Name",
  "url": "https://yourcompany.com",
  "logo": "https://yourcompany.com/logo.png",
  "sameAs": [
    "https://twitter.com/yourcompany",
    "https://linkedin.com/company/yourcompany"
  ]
}
</script>
html

Then run Google's Rich Results Test[3] against your URL. If it passes, you've already crossed out of the 93.9% bucket.

What to measure

Three signals tell you it's working:

  1. Rich Results Test passes with no warnings. Same-day.
  2. Pages with structured data climbs in your Google Search Console "Enhancements" tab. Two to four weeks.
  3. AI assistants name your brand when asked about your category. The slow signal, and the one that pays. Three to six months, depending on how often models retrain on your content.

If you want a free read on which schemas are missing from your site, run a scan. The audit agent generates the same structured-data recommendations referenced above, against your actual pages, in about three minutes.

References

  1. [1]
    schema.orgschema.org
    https://schema.org
    Accessed Jun 4, 2026
  2. [2]
    Introduction to structured data markup in Google SearchGoogle Search Central
    https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data
    Accessed Jun 4, 2026
  3. [3]
    Rich Results TestGoogle Search Central
    https://search.google.com/test/rich-results
    Accessed Jun 4, 2026

About Dan Johnson

Dan Johnson is the co-founder of GeoReputation, where he handles the engineering. Posts on this blog are usually grounded in data pulled live from the platform.