Resources
Practical guides

How to allow AI crawlers (GPTBot, ClaudeBot, Google-Extended)

If AI crawlers can't read your site, you don't show up in their answers. It's the most common and silent mistake in GEO: many sites block these bots without knowing it, often because of an option that's switched on by default at their CDN.

AEON42 unifies SEO, GEO, and AEO on your real Search Console data, so the first thing it checks is the most basic one — whether the engines that power AI answers can even reach your pages.

The AI crawlers that matter

  • GPTBot (OpenAI) · OAI-SearchBot / ChatGPT-User (ChatGPT's live web browsing).
  • ClaudeBot (Anthropic).
  • PerplexityBot (Perplexity).
  • Google-Extended (Google's AI training and features, including AI Overviews and Gemini).

Each of these reads your HTML the way a search crawler does. If any of them is blocked, that engine can't see your content, can't index it, and can't cite you — no matter how good the page is.

Step 1: check your robots.txt

Open https://yourdomain.com/robots.txt and look for rules like:

User-agent: GPTBot
Disallow: /

If you find them for the bots you want to allow, remove them or change them to Allow: /. To allow everyone, a single User-agent: * with Allow: / and no specific disallows is enough.

A few things to watch for:

  • A generic User-agent: bot rule can accidentally cover GPTBot if your parser matches loosely — be explicit about each agent you intend to block or allow.
  • Comments and a stale file are common. The robots.txt that's actually live is the one served at the URL above, not the copy in your repo.

Step 2: check your CDN (this is what almost nobody checks)

Cloudflare and other CDNs include features like "Block AI bots" that return a 403 to AI crawlers at the edge, regardless of what your robots.txt says. If you use Cloudflare, go to Security → Bots / AI Crawl Control and allow the crawlers you want. Also confirm that "Manage robots.txt" isn't overriding your own file.

This is the trap that catches most sites. Your robots.txt can be perfectly permissive while a managed setting at the edge quietly serves a 403 to every AI bot. Because the block happens before the request ever reaches your origin, nothing in your code or your robots.txt will reveal it — you have to look at the CDN dashboard.

Step 3: verify

Confirm that an AI crawler receives a 200 (not a 403) and that the robots.txt being served is the one you control. The fastest manual test is to request your homepage while sending one of the crawler user-agent strings and inspect the status code — a 403, 401, 429, or a timeout all mean blocked.

AEON42 checks your site's AI-crawler policy automatically (allowed / blocked / not verifiable) and tells you which engines can reach you, so you don't have to test each bot by hand. When a CDN or firewall returns an error instead of a clean allow/deny, it's flagged as "not verifiable" rather than a false "missing." See pricing · or read what GEO is.

Should you block any bot?

Allowing the crawlers that feed AI answers is the move if you want visibility — they're how ChatGPT, Perplexity, Gemini, and Google AI Overviews discover and cite you. That's separate from abusive scrapers, which you can throttle or block individually in your firewall by their own user-agent or by behavior. The goal isn't to open everything; it's to make sure the engines that can send you traffic and citations aren't blocked by accident.

Once access is in place, the next levers are citable content and structured data so the engines can not only reach your pages but understand and quote them.

Frequently asked questions

How do I know if I'm blocking AI crawlers?
Open your robots.txt at yourdomain.com/robots.txt and look for Disallow rules for GPTBot, ClaudeBot, PerplexityBot, or Google-Extended. Important: your CDN (Cloudflare, for example) can block AI bots at the edge even when your robots.txt allows them, so check the CDN dashboard too.
Should I block any bot?
It depends on your strategy. To gain visibility in AI answers, you want to allow the crawlers that feed answers and citations. You can block abusive scrapers separately in your firewall, by user-agent or behavior, without touching the engines that can cite you.

Want to measure this on your site? AEON42 connects your Search Console and tracks your AI visibility alongside your SEO.

View plans