Robots.txt Builder

Generate a clean, valid robots.txt file in seconds. Block specific paths, set sitemap location, allow/disallow specific user agents — including AI crawlers like GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Critical for both SEO and AEO/GEO control.

Robots.txt configuration

robots.txt

Configure on the left and click Generate.

Why robots.txt matters more in 2026

Beyond classic search crawler control, robots.txt is now your first lever for AI training-data permissions. Major AI companies respect specific user-agent directives — GPTBot for OpenAI, ClaudeBot for Anthropic, PerplexityBot for Perplexity, Google-Extended for Google’s Gemini training. Whether you allow or block these is a strategic decision: blocking protects content from training; allowing increases the chance your brand becomes part of the AI knowledge base (a GEO benefit).

Crawler control

Block private paths from all crawlers (admin, checkout, internal).

AI training opt-out

Block GPTBot, ClaudeBot, PerplexityBot if you don’t want content used for AI training.

AEO opt-in

Allow Google-Extended to keep eligibility for AI Overviews citations.

Sitemap discoverability

Sitemap line tells crawlers where your XML sitemap lives.

Free + private

Generates locally. No data sent to any server.

Validation

Test the result with Google’s robots.txt Tester or curl your live file.

FAQ

Where do I put robots.txt?

The root of your domain (e.g., https://example.com/robots.txt). Most CMS (WordPress, Shopify, Webflow) generate one automatically — you can override with this output.

Should I block AI crawlers?

Strategic call. Blocking protects content but reduces presence in AI answers. For most marketing-driven sites, allowing AI crawlers is the right move — it boosts AEO/GEO visibility.

What’s the difference between Disallow and noindex?

Disallow stops crawling. Noindex (a meta tag) stops indexing. They’re different layers — use the right tool for the job.

Will robots.txt prevent indexing?

Not always. If a blocked page has external links, Google may still show the URL (without content). Use noindex on the page itself for true exclusion.

How do I block all crawlers temporarily?

User-agent: *
Disallow: /. But be careful — this kills SEO. Use only for staging environments.

Need a full SEO/AEO/GEO audit?

Riman Agency runs technical SEO and AEO programs across enterprise sites.

Book a Strategy Call