Home › SEO Tools › Robots.txt Generator
Robots.txt Generator
FreeThe robots.txt file — the thing that's always someone else's job… until it breaks. Then it's YOUR problem. It's 4:00 PM. And yes, it's Friday. Set crawl rules for each bot. Block directories you don't want indexed. Add your sitemap URL. Copy it or download it instantly. No syntax errors. No guesswork. No panic.
How to use this generator
- Choose a starting point. For most websites, start with 'Allow all robots' and add specific restrictions. If you're on WordPress, WooCommerce, or another supported CMS, click the preset button to pre-fill the most common rules for your platform.
- Add your sitemap URL. Paste in the full URL of your sitemap (e.g.
https://example.com/sitemap.xml). This tells crawlers where to find your content — it's optional but strongly recommended. - Set crawl-delay if needed. Large sites that experience server strain from crawlers can add a delay. For most sites, leave this at the default. Note: Google ignores crawl-delay — it's respected by Bing, Yandex, and others.
- Block specific bots. Use the per-bot rows to allow or disallow individual crawlers. To block AI training bots like GPTBot or ClaudeBot, set those rows to 'Disallow'.
- Restrict directories. Add any paths you want to block from crawling — one per line, with a leading slash (e.g.
/admin/). The tool validates syntax as you type. - Copy or download. Click 'Copy' to paste the output directly into your site's root directory, or 'Download' to save the file. Upload it to
yourdomain.com/robots.txt.
Disallow: / in your output and didn't intend to block everything, stop. This tells every search engine to ignore your entire website. The tool warns you when this is set — take it seriously.
What is a robots.txt file?
A robots.txt file is a plain text file that sits in the root directory of your website and tells search engine crawlers which pages or directories they are and aren't allowed to access. Every major search engine checks for it before crawling your site.
It is part of the Robots Exclusion Protocol — an industry standard followed by Google, Bing, Yandex, and most legitimate crawlers. It's one of the first things a crawler reads when it visits your domain.
The file lives at a fixed location: https://yourdomain.com/robots.txt. You cannot change this path.
What robots.txt does not do
This is one of the most misunderstood points in technical SEO, and getting it wrong can cause real problems:
- It does not prevent indexing. A page you disallow can still appear in Google search results if another website links to it. Google may index the URL without ever crawling the page. To prevent a page from being indexed, use a
noindexmeta tag — not robots.txt. - It is not a security measure. The instructions in robots.txt are advisory. Malicious bots, scrapers, and bad actors ignore it entirely. Never put sensitive content behind a robots.txt disallow rule and call it protected.
- It does not apply to subdomains. A robots.txt at
example.com/robots.txtdoes not covershop.example.com. Each subdomain needs its own robots.txt file.
Why robots.txt matters for SEO
Crawl budget
Search engines allocate a crawl budget to every website — roughly the number of pages they'll crawl in a given period. For small sites (under a few hundred pages), crawl budget is rarely a concern. For large sites — ecommerce stores, news sites, content-heavy platforms — it matters significantly.
If crawlers spend their budget on low-value URLs (thank-you pages, filtered category pages, internal search results, login pages), they may not get around to crawling the content you actually want indexed. A well-configured robots.txt file directs that budget to your most important pages.
Duplicate content
Many CMS platforms — particularly WordPress and ecommerce platforms — automatically generate multiple URLs for the same content: tag pages, author archives, paginated versions, filtered URLs with tracking parameters. When search engines see the same content on multiple URLs, it dilutes your authority and can cause ranking confusion.
Disallowing these duplicate-generating patterns in robots.txt prevents crawlers from ever indexing them. This works best in combination with canonical tags — robots.txt stops the crawl, canonical tags handle any already-indexed duplicates.
Keeping private sections private (from search engines)
Staging environments, admin panels, internal tools, and development directories should never appear in search results. Disallowing these in robots.txt is your first line of defence — not a substitute for authentication, but a sensible additional layer.
Robots.txt syntax reference
A robots.txt file is made up of blocks, each starting with a User-agent line followed by one or more directives. Here are all the directives you need to know.
| Directive | What it does |
|---|---|
User-agent | Specifies which bot the following rules apply to. Use * to target all crawlers. |
Disallow | Blocks the specified path from being crawled. An empty value means allow everything. |
Allow | Explicitly permits a path — useful for carving out exceptions inside a Disallowed directory. |
Sitemap | Points crawlers to the absolute URL of your XML sitemap. Sits outside any User-agent group. |
Crawl-delay | Requests a pause (in seconds) between requests. Ignored by Google — respected by Bing, Yandex, and others. |
Syntax rules — what makes a valid robots.txt file
- One directive per line. Each rule must be on its own line. You cannot combine multiple paths on one line.
- Case-sensitive paths. File paths in robots.txt are case-sensitive.
/Admin/and/admin/are treated as different directories. Match the exact casing of your actual folders. - Groups start with User-agent. Each rule block starts with one or more User-agent lines, followed by the directives for that bot. A blank line separates groups.
- No quotes, semicolons, or inline comments on directive lines. Comments use # but must be on their own line or at the end after a space.
- Trailing slash for directories. Use
Disallow: /admin/notDisallow: /admin— without the trailing slash, the rule may not match all paths inside that directory.
Examples
Allow everything, block admin, include sitemap:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://example.com/sitemap.xml
Block AI training bots while allowing all search engines:
User-agent: * Disallow: User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: PerplexityBot Disallow: / Sitemap: https://example.com/sitemap.xml
WordPress site — block non-essential areas, preserve crawlability:
User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php Disallow: /?s= Disallow: /search/ Allow: /wp-admin/admin-ajax.php Sitemap: https://example.com/sitemap.xml
How to block AI bots from your website
Since 2023, a new category of bots has emerged: AI training crawlers. These are operated by AI companies and use your content to train large language models — not to send you search traffic. Many website owners want to opt out.
You can block these bots in robots.txt using their official user-agent names. The generator above includes all of them in the per-bot list.
| User-agent | Operated by |
|---|---|
GPTBot | OpenAI — training crawler for ChatGPT and GPT models |
ClaudeBot | Anthropic — training crawler for Claude |
OAI-SearchBot | OpenAI — search-oriented crawler |
PerplexityBot | Perplexity AI |
Meta-ExternalAgent | Meta — used for AI product development and training |
Applebot-Extended | Apple — AI-specific training crawler, separate from standard Applebot |
CCBot | Common Crawl — dataset used by many AI models |
To block all AI training bots while keeping search engines active, add a separate block for each bot after your main allow-all rule, as shown in the example in the syntax section above.
Common robots.txt mistakes — and how to fix them
| Mistake | What to do instead |
|---|---|
| The accidental full block | Setting Disallow: / for User-agent: * tells every search engine to ignore your entire website. It's the single most damaging robots.txt mistake — and it's easy to do by accident. Always check your output before uploading. The generator warns you when this is set. |
| Blocking CSS and JavaScript files | Blocking /wp-content/themes/ or /wp-content/plugins/ was once common practice. Do not do this today. Google needs to render your pages exactly as a user would. If it can't load your CSS and JS, it can't assess your content properly — and your rankings will suffer. |
| Case mismatch in directory paths | robots.txt paths are case-sensitive. If your folder is /Media/ but you write Disallow: /media/, the rule does nothing. Match the exact casing of your actual directory names. |
| Disallowing a page to prevent indexing | Disallowing a page stops crawling — it does not prevent indexing. If the page receives links from other sites, Google can still index the URL without visiting it, and it may appear in search results with no snippet. Use noindex to prevent indexing. |
| Missing trailing slash on directory paths | Disallow: /private blocks only the exact path /private. It does not block /private/file.html or /private-notes/. Use Disallow: /private/ (with a trailing slash) to block the entire directory. |
| One robots.txt trying to cover subdomains | Your robots.txt at example.com/robots.txt has no authority over shop.example.com or staging.example.com. Each subdomain needs its own robots.txt file in its own root directory. |
| Forgetting to include the sitemap | Your robots.txt is the most reliable place to tell crawlers where your sitemap lives. Without a Sitemap: directive, bots have to discover it another way — and they may not. Always include the full absolute URL to your sitemap. |
Robots.txt vs noindex: what's the difference?
This is one of the most common points of confusion in technical SEO. Both control what appears in search results, but they work at completely different stages of the process — and using the wrong one leads to unexpected results.
robots.txt Disallow | noindex meta tag | |
|---|---|---|
| What it controls | Whether a crawler visits and reads the page | Whether the page appears in search results |
| Mechanism | Crawler instruction — checked before the page is fetched | HTML tag — read after the page is fetched and rendered |
| Can prevent indexing? | No — only prevents crawling | Yes — prevents the page appearing in search results |
| Risk if page has inbound links | Page may be indexed with no snippet | Page will not be indexed regardless of inbound links |
| Use for | Managing crawl budget, blocking admin areas, protecting server resources | Preventing specific pages from appearing in search: thank-you pages, filtered URLs, duplicate content, login pages |
How to validate your robots.txt file after publishing
After uploading your robots.txt file, it's worth checking it works as intended. Here are three ways to do it.
1. Check it's accessible
Visit https://yourdomain.com/robots.txt in a browser. If you see your file as plain text, it's correctly placed and publicly accessible. If you get a 404, the file isn't in the root directory — check your upload location.
2. Use Google Search Console's robots.txt report
If your site is verified in Google Search Console, navigate to Settings → robots.txt. Google shows you the last version it fetched, when it was last read, and any warnings. This is the most authoritative check for how Google specifically is interpreting your file.
3. Test specific URLs
Google Search Console's URL Inspection tool lets you test whether a specific page is blocked by robots.txt. Enter any URL and the tool tells you if Googlebot is allowed to crawl it — useful for confirming your Disallow rules are working as intended without accidental collateral blocking.
Signs something is wrong
- Traffic drops suddenly — check if a new robots.txt deployment accidentally added
Disallow: / - Pages aren't being indexed — check robots.txt isn't blocking them before assuming the issue is elsewhere
- Google Search Console shows 'blocked by robots.txt' — your rules may be too broad
- New content is slow to be indexed — your sitemap may be missing from robots.txt
Frequently asked questions
Does every website need a robots.txt file?
Not strictly — if no robots.txt file is found, crawlers assume they can access everything. But having one is best practice for any website, even simple ones. At minimum, it's useful for pointing crawlers to your sitemap. For CMS-based sites (WordPress, Magento, etc.), it's essential for blocking auto-generated low-value pages.
What happens if I don't have a robots.txt file?
Crawlers will attempt to access all publicly available pages on your site. For small simple sites, this is usually fine. For larger sites with admin areas, staging pages, or CMS-generated duplicate URLs, it means crawlers may waste their budget on pages that add no SEO value — and potentially expose admin paths to bots.
Will robots.txt stop a page from showing in Google search?
Not reliably. Disallowing a page prevents crawling, but if the page has inbound links from other websites, Google can still index the URL without crawling it — it just won't have a content snippet. To reliably prevent a page from appearing in search results, use a noindex meta tag in the page's HTML head and allow crawling so Google can read the tag.
How do I block Googlebot specifically without affecting other bots?
Create a separate rule block using User-agent: Googlebot followed by your Disallow rules. End the block with a blank line, then add your rules for other bots (or a wildcard User-agent: * block). Googlebot will follow its specific rules; other bots will follow theirs.
How do I block AI training bots like ChatGPT and Claude?
Use the official user-agent names in separate blocks: GPTBot for OpenAI, ClaudeBot for Anthropic, PerplexityBot for Perplexity, OAI-SearchBot for OpenAI's search crawler, and CCBot for Common Crawl. Set Disallow: / for each. The generator above includes all of these in the per-bot list. Note this relies on those operators respecting robots.txt — most major AI companies do, but it is not a technical enforcement.
Does crawl-delay work with Google?
No. Google ignores the Crawl-delay directive in robots.txt. To control Google's crawl rate, use the crawl rate settings in Google Search Console (Settings → Crawl stats). Crawl-delay is respected by Bing, Yandex, and some other crawlers.
Where do I upload the robots.txt file?
The file must be placed in the root directory of your domain — accessible at https://yourdomain.com/robots.txt. For WordPress, this is the same directory as wp-config.php. For most hosting setups, this is the public_html folder. If you have subdomains, each needs its own robots.txt in its own root.
Can I have multiple User-agent rules in one robots.txt file?
Yes. You can have as many rule blocks as you need. Each block starts with one or more User-agent lines followed by the directives for those bots, then a blank line separating it from the next block. More specific bot rules take precedence over wildcard rules when both apply.
What's the difference between robots.txt and a sitemap?
They do opposite things. robots.txt tells crawlers which parts of your site to avoid. A sitemap tells crawlers which parts of your site to prioritise. Most sites need both: robots.txt to block low-value areas, and a sitemap to guide crawlers to your important content. Adding your sitemap URL to your robots.txt using the Sitemap: directive connects the two.
Is robots.txt case-sensitive?
The directive names (User-agent, Disallow, Allow) are not case-sensitive. But file paths and directory names are case-sensitive on most servers. If your folder is named /Blog/, then Disallow: /blog/ (lowercase) will not block it. Always match the exact casing of your actual directory structure.
More tools for technical SEO
robots.txt is one piece of the crawlability puzzle. These tools handle the rest:
- Hreflang Tag Generator Generate correct hreflang markup for multilingual and multi-regional websites. Supports HTML, XML sitemap, and HTTP header formats.
- GSC Regex Generator Build Google Search Console filters using RE2-compatible regex, without writing a line of regex yourself.
- XML Sitemap Generator Coming soon Generate a valid XML sitemap from a list of URLs.