Robots.txt Pro
Engineer, validate and test a robots.txt with the real Google-style matching algorithm — editable user-agent groups, Allow/Disallow rules, crawl-delay and sitemaps; a line-by-line validator; and a path tester that applies longest-match with * and $ wildcards. Entirely in your browser, nothing is sent.
User-agent: *
Allow: /
Disallow: /admin
Disallow: /cart
Disallow: /checkout
Crawl-delay: 5
User-agent: Googlebot
Allow: /
Disallow: /admin
Sitemap: https://example.com/sitemap.xml
2 groups · 1 sitemap · deploy to /robots.txt at your domain root.
Parsed structure & match detail
- 1 · Group selection. The crawler picks the most specific matching user-agent group (matched Googlebot). A named token beats the catch-all
*. - 2 · Longest match. Within that group, every Allow and Disallow whose pattern matches the path competes; the one matching the most characters wins, and Allow wins exact ties.
- 3 · Verdict. Disallowed: "Disallow: /admin" (6 chars) is the longest match, beating any Allow (1 chars).
What robots.txt does — and the mistakes that deindex sites
robots.txt is a crawl directive, not a security control. It is a public file at your domain root that asks well-behaved crawlers which paths to skip, and that is all it does. Anyone can read it, malicious bots ignore it, and listing a sensitive directory there merely advertises it. The right tools for keeping data private are authentication and server-side authorisation — robots.txt belongs to crawl budget and politeness, not to access control. Treat every line you add as something the whole world can see.
The trap that catches the most people is conflating Disallow with noindex. Disallow stops a crawler from fetching a URL; it does nothing to stop that URL from appearing in search results. In fact a Disallowed page can still rank as a bare link, because the engine knows the URL exists from other sites but is forbidden from fetching it to read a noindex tag. To remove a page from the index you must allow crawling and serve a noindex meta tag or header — and only Disallow it later, once it has dropped out. Blocking the page is the opposite of de-indexing it.
When several rules match a URL, real crawlers use the longest-match rule, not first-match or last-match: the pattern that matches the most characters of the path wins, and on an exact length tie Allow beats Disallow. The two wildcards that make this expressive are *, which matches any run of characters, and $, which anchors the end of the path — so Disallow: /*.pdf$ blocks PDF URLs precisely. The Test mode here implements that exact algorithm, including group selection, so you can confirm intent before a single wrong line — a stray Disallow: / or a blocked CSS directory — quietly removes your site from search.
Finally, declare your sitemap. The file-level Sitemap: directive points crawlers at the URLs you actively want discovered, complementing the paths you have told them to avoid. Build the sitemap itself with the Sitemap Pro generator, and once crawling is dialled in, measure what indexing and traffic actually result over in data & analytics.
Trusted by SEO & Platform Engineers
“The Test mode is the part I trust — it actually implements Google's longest-match rule with * and $ instead of a naive 'does the path start with this string' check that every other generator ships. I paste our real robots.txt, throw a dozen URL/agent pairs at it, and see exactly which rule wins before anything goes live. It caught a CSS directory we were accidentally blocking.”
“Build mode gives me clean, correctly-grouped output with the Sitemap at the file level and per-agent crawl-delay, and the validator flags the things that actually deindex sites: rules before any User-agent, duplicate groups, an accidental Disallow: /. It's become a pre-deploy gate in our review checklist. Nothing leaves the browser, which legal likes.”
“I rebuild robots.txt for storefronts constantly — block /cart, /checkout, faceted-search URLs — and the presets plus the Allow/Disallow row editor make it fast. The validator's note that an empty Disallow means allow-all has saved a couple of clients from confusion. I'd like an export-to-file button, but copy-to-clipboard covers it for now.”
“We migrated a large site and the deindex risk terrified me. Pasting the candidate robots.txt into the validator and then walking key paths through the tester — with the explanation of which rule matched and why — turned a scary change into a reviewable one. The per-user-agent group selection logic matches what Googlebot actually does.”
Love using our calculator?
Related SEO & technical tools
Related Articles
Dive deeper with our expert guides and tutorials related to Robots.txt Pro
build · validate · test · longest-match engine · in-browser · nothing sent · Last reviewed: 2026-06