Question 1

What does robots.txt actually do?

Accepted Answer

robots.txt is a plain-text file at the root of your domain (https://example.com/robots.txt) that tells well-behaved crawlers which URL paths they may or may not fetch. It is a set of crawl directives organised into User-agent groups, each containing Allow and Disallow rules plus optional Crawl-delay, with Sitemap declarations at the file level. Compliant bots — Googlebot, Bingbot and most reputable crawlers — read it before crawling and honour it. It controls crawling, not indexing, and it is advisory: it relies on the crawler choosing to obey it.

Question 2

Is robots.txt a security control? Can I hide private pages with it?

Accepted Answer

No — and this is the single most dangerous misconception. robots.txt is publicly readable by anyone (it is literally served at /robots.txt) and it only asks polite crawlers not to fetch listed paths. Malicious bots ignore it entirely, and listing a sensitive path like /admin or /internal-export actually advertises its existence to attackers. Never use robots.txt to protect private data. Use real authentication, authorisation, and server-side access controls. Treat robots.txt purely as a crawl-budget and politeness tool.

Question 3

What is the difference between Disallow and noindex?

Accepted Answer

Disallow tells a crawler not to fetch a URL; noindex tells a search engine not to show a URL in results. They are not interchangeable, and combining them incorrectly backfires. If you Disallow a page, the crawler never fetches it, so it never sees a noindex tag on that page — meaning a Disallowed URL can still appear in search results (as a bare link, if other sites point to it) because Google knows the URL exists but cannot read its instructions. To reliably keep a page out of the index, allow crawling and add a <meta name="robots" content="noindex"> tag or an X-Robots-Tag header, then optionally Disallow it later once it has dropped out.

Question 4

How does the longest-match rule work?

Accepted Answer

When several Allow and Disallow rules match the same URL within the chosen user-agent group, Google does not use the first rule or the last rule — it uses the most specific one, defined as the rule whose path pattern matches the greatest number of characters. So with Disallow: /folder/ and Allow: /folder/public, the path /folder/public/page is allowed because the Allow pattern is longer (more specific). If two matching rules are exactly the same length, Allow wins the tie. The Test mode in this tool implements exactly this algorithm and shows you which rule won and why.

Question 5

What do the * and $ wildcards mean?

Accepted Answer

Inside a path, * matches any sequence of characters (including none), so Disallow: /*.pdf$ blocks every URL ending in .pdf, and Disallow: /search/*?sort= blocks sorted search URLs. The $ anchor matches the end of the URL path, so Disallow: /*.php$ blocks /page.php but not /page.php?id=2. Without $, a pattern matches as a prefix. These two wildcards are supported by Google and most major crawlers; the matcher in this tool implements both, converting the pattern to a precise rule rather than a naive substring check.

Question 6

Is Crawl-delay actually honoured?

Accepted Answer

It depends on the crawler. Crawl-delay: 5 asks a bot to wait five seconds between requests, and Bing, Yandex and several others respect it. Google, however, ignores Crawl-delay entirely — you control Googlebot's crawl rate in Google Search Console instead, and Google generally auto-tunes based on your server's response times. Because support is inconsistent, treat Crawl-delay as a hint for the crawlers that read it, not a guarantee. This tool emits it in the generated file and parses it, but flags that its effect varies by crawler.

Question 7

How do per-user-agent groups work?

Accepted Answer

A robots.txt can contain multiple groups, each beginning with one or more User-agent lines followed by that group's rules. A crawler reads only the single most specific group that matches its product token — an exact or substring match on a named token like Googlebot beats the catch-all User-agent: *. It does not merge rules across groups. That means if you create a Googlebot group, Googlebot ignores the * group entirely, so any rule you want Googlebot to follow must be repeated in its own group. This tool's Test mode performs that group-selection step before matching paths.

Question 8

What does the Sitemap directive do?

Accepted Answer

Sitemap: https://example.com/sitemap.xml tells crawlers where to find your XML sitemap — a list of the URLs you want discovered, with optional metadata like last-modified dates. Unlike Allow/Disallow it is not tied to a User-agent group; it is a file-level declaration and you can list several. It does not force indexing, but it helps crawlers find pages that are weakly linked, and it pairs naturally with robots.txt: Disallow tells bots where not to go, Sitemap tells them where you want them to go. Declaring it costs nothing and is a recommended baseline.

Question 9

Why should I test robots.txt before deploying it?

Accepted Answer

A single wrong line can deindex an entire site. The classic catastrophe is shipping Disallow: / (which blocks everything) left over from a staging environment, or Disallowing a directory that contains your CSS and JS so Google can no longer render your pages. Because robots.txt is consulted before any crawl, mistakes propagate fast and recovery takes time. Testing specific user-agent and path combinations against the file — exactly what the Test mode here does, using the real longest-match algorithm — lets you confirm your intent before the file ever reaches production.

Question 10

Is anything I paste here uploaded to a server?

Accepted Answer

No. The entire suite — building, validating and testing — runs locally in your browser in JavaScript. Your robots.txt, the user-agent strings and the paths you test never leave the page, there is no account, no telemetry and no logging, and it works offline once loaded. That makes it safe to test rules for unreleased or internal sites. The copy buttons use your local clipboard only.

Robots.txt Pro

Parsed structure & match detail

What robots.txt does — and the mistakes that deindex sites

Robots.txt Pro FAQs

Trusted by SEO & Platform Engineers

Related SEO & technical tools

Related Articles

Technical Services