Does robots.txt block pages from Google's index?

No. Disallowing a page in robots.txt tells Google not to crawl it, but if another site links to that page, Google may still index it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag header instead. Robots.txt and noindex serve different purposes.

What is the correct format for robots.txt?

A valid robots.txt file consists of one or more groups. Each group starts with one or more User-agent lines, followed by Disallow and Allow directives. Lines beginning with # are comments. The file must be UTF-8 encoded, placed exactly at the root of your domain (not a subdirectory), and served with Content-Type text/plain.

SEO · Crawlers

Robots.txt Generator

Build a valid robots.txt visually — set User-agent rules, Allow/Disallow paths, Crawl-delay, and Sitemap. Download or copy in one click.

Presets

Groups

Rules

Lines

Sitemap

Global Settings

Sitemap URL Absolute URL to your XML sitemap. Strongly recommended.

Host directive Preferred domain (Yandex). Optional.

robots.txt

Directives Reference

Every valid directive and what it does in a robots.txt file.

Core Directives

User-agentSpecifies the bot this group applies to. * matches all bots.

DisallowPath the bot must not crawl. Empty value means "allow all".

AllowExplicitly permits crawling a path, even within a Disallowed tree.

Crawl-delaySeconds to wait between requests. Not supported by Googlebot.

SitemapAbsolute URL to your XML sitemap. Can appear multiple times.

HostPreferred domain for Yandex. Not a Google directive.

Path Pattern Examples

/admin/Block the /admin/ directory and all its contents

/Block the entire site

(empty)Allow the entire site (Disallow with empty value)

/*.pdf$Block all URLs ending in .pdf (uses $ anchor)

/search*Block any URL starting with /search

/*?*Block all URLs with query strings

Common Bot Names

GooglebotGoogle's main web crawler

BingbotMicrosoft Bing crawler

SlurpYahoo! crawler

DuckDuckBotDuckDuckGo crawler

facebotFacebook's crawler (for OG tags)

TwitterbotTwitter's card crawler

Path Priority Rules

More specific winsGoogle picks the most specific matching rule

Allow beats DisallowWhen rules are equally specific, Allow wins

Case-sensitivePaths are case-sensitive; /Admin ≠ /admin

$ = end of URLAnchors the pattern to the end of the URL string

* = wildcardMatches zero or more characters at that position

How to Use This Tool

From generating your file to placing it on your server.

Pick a Preset

Start from a preset that matches your platform — WordPress, e-commerce, Next.js, or a minimal template. It pre-fills the most common rules for that setup.

Set Your Sitemap URL

Enter the full URL to your XML sitemap (e.g. https://example.com/sitemap.xml). This is required for Googlebot to discover all your pages efficiently.

Configure User-agent Groups

Each group targets specific bots. Use * for all bots, or name specific ones like Googlebot. Add as many groups as you need.

Add Allow/Disallow Rules

Click + Disallow or + Allow to add path rules. Use /admin/ to block a folder, /*.pdf$ for file types, or leave blank to allow everything.

Check the Validation Panel

The validation panel flags common mistakes — missing sitemap, paths not starting with /, blocking your entire site. Fix any errors before downloading.

Download & Deploy

Click Download to save the file, then upload it to the root of your domain as robots.txt. It must be accessible at https://yourdomain.com/robots.txt.

What's Built In

Everything you need to build, validate, and deploy a robots.txt file.

🏗️

Visual Builder

Add and remove rules with buttons — no need to remember directive syntax or worry about formatting.

👥

Multiple Groups

Add as many User-agent groups as you need. Target specific bots with custom rules while keeping general rules for all others.

⚡

Presets

Seven ready-to-use presets covering WordPress, e-commerce, Next.js, minimal, block-all, allow-all, and bad-bot blocking.

✅

Inline Validation

Real-time checks catch missing sitemaps, incorrect path formats, accidental full-site blocks, and other common mistakes.

💾

Download as robots.txt

Download the generated file with the correct filename, ready to upload directly to your domain root.

🔒

100% Private

Everything runs in your browser. Your domain and paths never leave your device — no server calls, no data collection.

🔄

Toggle Rule Types

Flip any rule between Allow and Disallow with one click. Duplicate groups to test different configurations side by side.

🌐

Sitemap Integration

Set your sitemap URL and it's added to the top of the file — the recommended location for all major crawlers.

Frequently Asked Questions

Everything you need to know about robots.txt files.

A robots.txt file is a plain text file placed at the root of your website (yoursite.com/robots.txt) that instructs web crawlers which pages or sections they should and should not crawl. It uses the Robots Exclusion Protocol — a standard that well-behaved bots like Googlebot follow voluntarily. It does not enforce access control; it's purely advisory.

Not reliably. Disallowing a page in robots.txt tells Google not to crawl it, but if another site links to that blocked URL, Google may still index it and show it in search results — just without any snippet content. To actually prevent a page from appearing in search results, you need a <meta name="noindex"> tag on the page, or an X-Robots-Tag: noindex HTTP response header. Robots.txt controls crawling. Noindex controls indexing. They serve different purposes and should be used together when both behaviors are needed.

The file must be named exactly robots.txt (lowercase) and placed at the root of your domain — accessible at https://yourdomain.com/robots.txt. It cannot be in a subdirectory. If your site is on a subdomain like blog.example.com, that subdomain needs its own robots.txt at blog.example.com/robots.txt. The server must return it with a 200 OK status and Content-Type: text/plain. A 404 response means "no restrictions" to most crawlers. A 500 or 503 may cause crawlers to pause and retry later.

Google applies the most specific matching rule. If two rules are equally specific, Allow takes precedence over Disallow. For example, if you have Disallow: /cms/ and Allow: /cms/public/, then /cms/public/ is allowed (more specific match) while all other /cms/ paths are blocked. The older rule of "Disallow wins on ties" no longer applies for Google — Allow beats Disallow when rules have the same specificity score. Other crawlers may behave differently.

Crawl-delay tells a bot how many seconds to wait between requests. It's useful for bots that crawl aggressively and overload your server. However, Googlebot ignores Crawl-delay entirely — Google manages its crawl rate based on server response times and settings in Google Search Console. Crawl-delay is respected by Bingbot, Yandex, and some other bots. If you need to throttle Googlebot, use the crawl rate settings in Google Search Console instead.

Google and most modern crawlers support two pattern-matching characters: * (matches zero or more characters) and $ (anchors the pattern to the end of the URL). For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. Disallow: /search* blocks any URL starting with /search. Not all crawlers support wildcards — the original robots.txt spec didn't include them, but Google explicitly supports both patterns.

Robots.txt only works on well-behaved, cooperative bots. Malicious crawlers, scrapers, and most spam bots simply ignore it. If you need to actually block bots, use server-level solutions: IP blocking, rate limiting, CAPTCHAs, or bot-management services like Cloudflare Bot Management. The robots.txt file is intended as a crawl management tool for legitimate search engines — it's not a security measure.

Disallow: /admin blocks /admin, /admin/, and /administrator — anything starting with that string. Disallow: /admin/ (with trailing slash) blocks /admin/ and all its contents, but technically not the URL /admin by itself. In practice, Google interprets /admin/ conservatively, so both work for directory blocking. The trailing slash makes intent clearer for directories; omit it for specific page stems.

Use Google Search Console → Settings → Robots.txt tester. It shows you the live robots.txt file Google is reading for your site, lets you test specific URLs to see whether they're blocked or allowed, and highlights any errors in the file. You can also visit https://yourdomain.com/robots.txt directly in a browser to confirm the file is serving correctly. After making changes, it typically takes Google 24–48 hours to re-read the updated file.

Robots.txt directly affects how efficiently Google's crawl budget is spent on your site. Blocking irrelevant or duplicate pages (like faceted navigation URLs, session-based URLs, admin pages, and pagination variations) means Googlebot spends its crawl budget on pages you actually want indexed. This matters more for large sites with thousands of pages. For small sites with clean architecture, a minimal robots.txt is usually sufficient. Robots.txt is not a ranking factor itself — but helping Google crawl your important pages efficiently has indirect SEO benefits.

Related Tools

View all →

Robots.txt Generator

Directives Reference

How to Use This Tool

What's Built In

Frequently Asked Questions

Related Tools

Related Articles

About This Tool

How to Use

Logic & Algorithm