AI Bot Reference · OpenAI

GPTBot

GPTBot is OpenAI's public web crawler. It fetches publicly accessible pages for two purposes: building training corpora for future GPT models, and indexing for SearchGPT, OpenAI's real-time web search. GPTBot respects robots.txt, does NOT execute JavaScript, and (per Q1 2026 policy) honors HTTP 402 Payment Required.

Quick facts

User-Agent containsGPTBot/
OwnerOpenAI
Respects robots.txtYes
Executes JavaScriptNo (raw HTML only)
Honors HTTP 402Yes (Q1 2026 policy)
Public IP rangesopenai.com/gptbot.json
Documentationplatform.openai.com/docs/gptbot

How to allow GPTBot

Most sites should allow GPTBot by default — it's how your content gets cited in ChatGPT and ranks in SearchGPT. Add this to your robots.txt:

User-agent: GPTBot
Allow: /

# Optional: deny specific paths
Disallow: /admin/
Disallow: /private/

If GPTBot is being blocked at the CDN layer (Cloudflare's Bot Fight Mode is the common culprit), allowing it in robots.txt is necessary but not sufficient — you also need to whitelist its IP ranges at the CDN. Cloudflare users: Security → Bots → Configure Super Bot Fight Mode → AI Scrapers and Crawlers → Allow.

How to block GPTBot

Blocking GPTBot prevents your content from being used in OpenAI's training and from appearing in SearchGPT answers. Note: this does NOT affect Google search.

User-agent: GPTBot
Disallow: /

For granular control — block only specific sections, or charge GPTBot per request via HTTP 402 — use Bot Paywall, bundled free on every paid plan.

GPTBot and JavaScript rendering

GPTBot does not execute JavaScript. It fetches the raw HTML response and parses what's in the initial document. This is the single most common reason sites are invisible to ChatGPT: a React/Vue/Svelte app that renders content client-side returns an empty shell to GPTBot.

Fix: server-side render (SSR), static-generate (SSG), or pre-render the content you want indexed. Next.js, Astro, Remix, and SvelteKit all default to SSR/SSG when configured correctly. Plain HTML works too.

Our audit compares your raw HTML to your rendered HTML and reports the gap. If GPTBot sees an empty <body>while a browser sees full content, you'll see a critical issue tagged with a per-agent impact badge for GPTBot.

Verifying real GPTBot traffic

Bad actors spoof the GPTBot User-Agent to scrape sites under cover. To verify a real GPTBot request:

  1. Fetch https://openai.com/gptbot.json — list of public IP ranges
  2. Check the request's source IP against that list (CIDR match)
  3. Perform a forward-confirmed reverse-DNS lookup: PTR record should match *.openai.com; then forward-resolve that hostname back and confirm it matches the source IP

Cloudflare's Bot Management does this automatically and exposes a verified-bot flag on request.cf.botManagement.verifiedBot.

Audit your site for GPTBot accessibility

Run a free audit. Get a per-agent severity report for GPTBot specifically: which checks pass, which fail, and the paste-ready fixes for the critical ones.

Related

Block or charge ClaudeBot

Anthropic's crawler. Same robots.txt respect as GPTBot — gate or monetize it per URL with Bot Paywall.

Block or charge PerplexityBot

Powers Perplexity Answers, mixed compliance reputation — enforce access rules at the edge with Bot Paywall.

Compare to Profound

We audit accessibility; they measure citations. Complementary.

Pricing

Free (Visible) tier audits GPTBot at full fidelity.

FAQ

What is GPTBot's User-Agent string?

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.0; +https://openai.com/gptbot. The trailing URL points to OpenAI's official documentation page; we recommend matching on the literal substring "GPTBot/" rather than the full string, because the version number may change without notice.

Does GPTBot respect robots.txt?

Yes. GPTBot honors both the User-agent directive (User-agent: GPTBot) and the wildcard (User-agent: *) per the standard. OpenAI's policy is documented at https://platform.openai.com/docs/gptbot. Block GPTBot in robots.txt and content from your site will not be used in OpenAI's training data; new pages will not be ingested.

Does GPTBot execute JavaScript?

No, GPTBot fetches raw HTML only. JavaScript-rendered content is invisible to it. If your site is a single-page app that renders content client-side, GPTBot sees an empty shell. Use server-side rendering, static generation, or a hybrid approach (Next.js SSR, Astro, plain HTML) to make content visible. We audit this gap automatically.

How do I verify a request is actually from GPTBot (not a spoof)?

OpenAI publishes the public IP ranges that GPTBot crawls from at https://openai.com/gptbot.json. Verify by checking the request's source IP against this list. Reverse DNS (PTR record) of GPTBot's IPs maps to *.openai.com — chain a forward-confirmed reverse-DNS check for higher confidence.

Does GPTBot honor HTTP 402 Payment Required?

Per OpenAI policy as of Q1 2026, GPTBot will back off on HTTP 402 responses. Real payment verification (e.g. x402) is not yet enforced — the response is currently treated as "please do not crawl" rather than as a payment challenge. As payment-rail standards mature, this is expected to evolve.

Does blocking GPTBot affect my Google search ranking?

No. GPTBot is OpenAI's crawler; it has nothing to do with Googlebot or Google's search index. They are separate user-agents from separate companies. Blocking GPTBot will not affect your rankings in Google search. If you want to opt out of Google's AI training while staying in search, block Google-Extended (a separate token from Googlebot).

GPTBot vs OAI-SearchBot vs ChatGPT-User — what's the difference?

GPTBot crawls for training data and SearchGPT's index. OAI-SearchBot specifically powers the SearchGPT real-time fetch ("browse the web") flow. ChatGPT-User fetches a URL when a ChatGPT user explicitly clicks or instructs ChatGPT to load it. All three are OpenAI; you can manage them independently in robots.txt.

Last reviewed 2026-05-15 · We update bot references as their owners publish changes. Spot something out of date? Tell us.