search

Found

info Overview

Build robots.txt from per-user-agent Allow/Disallow rules, with sitemap line, crawl-delay, and one-click presets for blocking AI bots or a standard setup.

📘 How to Use

  1. Pick a preset or click Add Rule, then set a User-agent and Allow/Disallow
  2. Type the target path and, if needed, a Sitemap URL and crawl delay
  3. Review the generated robots.txt text on the right

robots.txt Generator

0 = none
Copied!
Article

robots.txt Generator | Visual Editor to Build Crawl Rules

Build a robots.txt by stacking Allow/Disallow rules per user-agent, with an optional sitemap line and crawl-delay. One-click presets cover allow-all, block-all, a standard setup, and blocking AI bots, so you get valid syntax without memorizing the format.

💡 About this tool

A robots.txt file lives at your site root and tells crawlers which paths they may fetch. The syntax looks simple, but the details bite: rules are grouped per User-agent, Allow and Disallow precedence depends on match length, and a missing or extra trailing slash changes what gets blocked. Hand-editing it is where most mistakes creep in.

This editor rebuilds the output every time you change a rule. Rules that share a user-agent are merged into one group automatically, and any Sitemap: line is appended at the end. The "Standard (Recommended)" preset blocks the usual /admin/, /private/, and /tmp/ paths; "Block AI Bots" drops in GPTBot, ChatGPT-User, CCBot, and Google-Extended in one go. Copy the text from the output box and save it as robots.txt at your domain root.

🧐 Frequently Asked Questions

Will blocking a page in robots.txt remove it from Google? No. robots.txt controls crawling, not indexing. A blocked URL can still appear in results if other pages link to it, just without a snippet. To keep a page out of results, use a noindex meta tag or password protection instead.

What does an empty Disallow mean? Disallow: with nothing after it means "block nothing" — effectively allow all. That is what the Allow All preset emits. By contrast, Disallow: / tells crawlers to stay off the entire site.

Allow and Disallow both match a URL — which wins? Major search engines use the most specific (longest) matching rule. So Disallow: /folder/ plus Allow: /folder/public/ blocks the folder but still permits everything under /folder/public/.

Does Crawl-delay work for Googlebot? Google ignores the Crawl-delay line; crawl rate is managed in Search Console. Some crawlers such as Bing do honor it, so keep a value if you target those.

Can I write rules for more than one bot? Yes. Add a rule for each user-agent name and each becomes its own group in the output. Use * for a rule that applies to every crawler.

📚 Why robots.txt still matters

The robots.txt convention — formally the Robots Exclusion Protocol — was a de facto standard for decades before being written up as RFC 9309. One rule that trips people up: the file only works from the root of a host (https://example.com/robots.txt); dropping it in a subfolder does nothing.

It has had a second life in the AI era. Site owners now add user-agents like GPTBot, CCBot, and the crawl-vs-training split Google-Extended to opt out of having their content scraped for model training. Worth remembering, though: robots.txt is a request, not a lock. Well-behaved crawlers respect it, but nothing technically stops a bot that chooses to ignore it.