Skip to content
build · validate · test — real longest-match engine

Robots.txt Pro

Engineer, validate and test a robots.txt with the real Google-style matching algorithm — editable user-agent groups, Allow/Disallow rules, crawl-delay and sitemaps; a line-by-line validator; and a path tester that applies longest-match with * and $ wildcards. Entirely in your browser, nothing is sent.

01 · Mode
Presets
Sitemaps
robots.txt
User-agent: *
Allow: /
Disallow: /admin
Disallow: /cart
Disallow: /checkout
Crawl-delay: 5

User-agent: Googlebot
Allow: /
Disallow: /admin

Sitemap: https://example.com/sitemap.xml

2 groups · 1 sitemap · deploy to /robots.txt at your domain root.

Deep analysis

Parsed structure & match detail

Parsed groups
User-agent: *
Allow: /line 2
Disallow: /adminline 3
Disallow: /cartline 4
Disallow: /checkoutline 5
Crawl-delay: 5
User-agent: Googlebot
Allow: /line 9
Disallow: /adminline 10
Sitemaps
https://example.com/sitemap.xml
Match detail · Googlebot/admin/settings
DISALLOWED
  1. 1 · Group selection. The crawler picks the most specific matching user-agent group (matched Googlebot). A named token beats the catch-all *.
  2. 2 · Longest match. Within that group, every Allow and Disallow whose pattern matches the path competes; the one matching the most characters wins, and Allow wins exact ties.
  3. 3 · Verdict. Disallowed: "Disallow: /admin" (6 chars) is the longest match, beating any Allow (1 chars).
Field notes

What robots.txt does — and the mistakes that deindex sites

robots.txt is a crawl directive, not a security control. It is a public file at your domain root that asks well-behaved crawlers which paths to skip, and that is all it does. Anyone can read it, malicious bots ignore it, and listing a sensitive directory there merely advertises it. The right tools for keeping data private are authentication and server-side authorisation — robots.txt belongs to crawl budget and politeness, not to access control. Treat every line you add as something the whole world can see.

The trap that catches the most people is conflating Disallow with noindex. Disallow stops a crawler from fetching a URL; it does nothing to stop that URL from appearing in search results. In fact a Disallowed page can still rank as a bare link, because the engine knows the URL exists from other sites but is forbidden from fetching it to read a noindex tag. To remove a page from the index you must allow crawling and serve a noindex meta tag or header — and only Disallow it later, once it has dropped out. Blocking the page is the opposite of de-indexing it.

When several rules match a URL, real crawlers use the longest-match rule, not first-match or last-match: the pattern that matches the most characters of the path wins, and on an exact length tie Allow beats Disallow. The two wildcards that make this expressive are *, which matches any run of characters, and $, which anchors the end of the path — so Disallow: /*.pdf$ blocks PDF URLs precisely. The Test mode here implements that exact algorithm, including group selection, so you can confirm intent before a single wrong line — a stray Disallow: / or a blocked CSS directory — quietly removes your site from search.

Finally, declare your sitemap. The file-level Sitemap: directive points crawlers at the URLs you actively want discovered, complementing the paths you have told them to avoid. Build the sitemap itself with the Sitemap Pro generator, and once crawling is dialled in, measure what indexing and traffic actually result over in data & analytics.

Robots.txt Pro FAQs

Have more questions? Contact us

Trusted by SEO & Platform Engineers

4.8
Based on 1,480 reviews

The Test mode is the part I trust — it actually implements Google's longest-match rule with * and $ instead of a naive 'does the path start with this string' check that every other generator ships. I paste our real robots.txt, throw a dozen URL/agent pairs at it, and see exactly which rule wins before anything goes live. It caught a CSS directory we were accidentally blocking.

P
Priya Venkatesh
Technical SEO lead
June 11, 2026

Build mode gives me clean, correctly-grouped output with the Sitemap at the file level and per-agent crawl-delay, and the validator flags the things that actually deindex sites: rules before any User-agent, duplicate groups, an accidental Disallow: /. It's become a pre-deploy gate in our review checklist. Nothing leaves the browser, which legal likes.

M
Marcus Feldt
Platform engineer
May 19, 2026

I rebuild robots.txt for storefronts constantly — block /cart, /checkout, faceted-search URLs — and the presets plus the Allow/Disallow row editor make it fast. The validator's note that an empty Disallow means allow-all has saved a couple of clients from confusion. I'd like an export-to-file button, but copy-to-clipboard covers it for now.

H
Hana Sato
E-commerce SEO consultant
April 23, 2026

We migrated a large site and the deindex risk terrified me. Pasting the candidate robots.txt into the validator and then walking key paths through the tester — with the explanation of which rule matched and why — turned a scary change into a reviewable one. The per-user-agent group selection logic matches what Googlebot actually does.

O
Oluwaseun Adeyemi
Site reliability engineer
February 14, 2026

Love using our calculator?

Connected instruments

Related SEO & technical tools

Learn More

Related Articles

Dive deeper with our expert guides and tutorials related to Robots.txt Pro

Loading articles...

build · validate · test · longest-match engine · in-browser · nothing sent · Last reviewed: 2026-06