Robots.txt Generator
Build a valid robots.txt visually — set User-agent rules, Allow/Disallow paths, Crawl-delay, and Sitemap. Download or copy in one click.
* for all bots, or a specific bot name like Googlebot.
Directives Reference
Every valid directive and what it does in a robots.txt file.
* matches all bots./Admin ≠ /adminHow to Use This Tool
From generating your file to placing it on your server.
https://example.com/sitemap.xml). This is required for Googlebot to discover all your pages efficiently.* for all bots, or name specific ones like Googlebot. Add as many groups as you need./admin/ to block a folder, /*.pdf$ for file types, or leave blank to allow everything.robots.txt. It must be accessible at https://yourdomain.com/robots.txt.What's Built In
Everything you need to build, validate, and deploy a robots.txt file.
Frequently Asked Questions
Everything you need to know about robots.txt files.
A robots.txt file is a plain text file placed at the root of your website (yoursite.com/robots.txt) that instructs web crawlers which pages or sections they should and should not crawl. It uses the Robots Exclusion Protocol — a standard that well-behaved bots like Googlebot follow voluntarily. It does not enforce access control; it's purely advisory.
Not reliably. Disallowing a page in robots.txt tells Google not to crawl it, but if another site links to that blocked URL, Google may still index it and show it in search results — just without any snippet content. To actually prevent a page from appearing in search results, you need a <meta name="noindex"> tag on the page, or an X-Robots-Tag: noindex HTTP response header. Robots.txt controls crawling. Noindex controls indexing. They serve different purposes and should be used together when both behaviors are needed.
The file must be named exactly robots.txt (lowercase) and placed at the root of your domain — accessible at https://yourdomain.com/robots.txt. It cannot be in a subdirectory. If your site is on a subdomain like blog.example.com, that subdomain needs its own robots.txt at blog.example.com/robots.txt. The server must return it with a 200 OK status and Content-Type: text/plain. A 404 response means "no restrictions" to most crawlers. A 500 or 503 may cause crawlers to pause and retry later.
Google applies the most specific matching rule. If two rules are equally specific, Allow takes precedence over Disallow. For example, if you have Disallow: /cms/ and Allow: /cms/public/, then /cms/public/ is allowed (more specific match) while all other /cms/ paths are blocked. The older rule of "Disallow wins on ties" no longer applies for Google — Allow beats Disallow when rules have the same specificity score. Other crawlers may behave differently.
Crawl-delay tells a bot how many seconds to wait between requests. It's useful for bots that crawl aggressively and overload your server. However, Googlebot ignores Crawl-delay entirely — Google manages its crawl rate based on server response times and settings in Google Search Console. Crawl-delay is respected by Bingbot, Yandex, and some other bots. If you need to throttle Googlebot, use the crawl rate settings in Google Search Console instead.
Google and most modern crawlers support two pattern-matching characters: * (matches zero or more characters) and $ (anchors the pattern to the end of the URL). For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. Disallow: /search* blocks any URL starting with /search. Not all crawlers support wildcards — the original robots.txt spec didn't include them, but Google explicitly supports both patterns.
Robots.txt only works on well-behaved, cooperative bots. Malicious crawlers, scrapers, and most spam bots simply ignore it. If you need to actually block bots, use server-level solutions: IP blocking, rate limiting, CAPTCHAs, or bot-management services like Cloudflare Bot Management. The robots.txt file is intended as a crawl management tool for legitimate search engines — it's not a security measure.
Disallow: /admin blocks /admin, /admin/, and /administrator — anything starting with that string. Disallow: /admin/ (with trailing slash) blocks /admin/ and all its contents, but technically not the URL /admin by itself. In practice, Google interprets /admin/ conservatively, so both work for directory blocking. The trailing slash makes intent clearer for directories; omit it for specific page stems.
Use Google Search Console → Settings → Robots.txt tester. It shows you the live robots.txt file Google is reading for your site, lets you test specific URLs to see whether they're blocked or allowed, and highlights any errors in the file. You can also visit https://yourdomain.com/robots.txt directly in a browser to confirm the file is serving correctly. After making changes, it typically takes Google 24–48 hours to re-read the updated file.
Robots.txt directly affects how efficiently Google's crawl budget is spent on your site. Blocking irrelevant or duplicate pages (like faceted navigation URLs, session-based URLs, admin pages, and pagination variations) means Googlebot spends its crawl budget on pages you actually want indexed. This matters more for large sites with thousands of pages. For small sites with clean architecture, a minimal robots.txt is usually sufficient. Robots.txt is not a ranking factor itself — but helping Google crawl your important pages efficiently has indirect SEO benefits.