Build a valid robots.txt file in seconds. Set crawl rules for any bot, add your sitemap, and copy the output. Free, no sign-up required.
Leave the path empty for a default "Allow: /" (allow all). Use "/" to block or allow everything.
Strongly recommended. Helps all search engines find your sitemap.
Note: Googlebot ignores Crawl-delay. Only use if non-Google bots are overloading your server.
More free tools to cover every aspect of your technical and on-page SEO.
Paste your content and check keyword density instantly. Optimize without over-stuffing.
Use Tool \u2192Analyze your content readability with Flesch Reading Ease and grade-level scores.
Use Tool \u2192Check if your page title fits Google SERP limits and preview how it looks in search results.
Use Tool \u2192Generate perfectly formatted title and meta description tags for any page.
Use Tool \u2192Build valid FAQ JSON-LD schema markup to unlock rich results in Google search.
Use Tool \u2192Create Open Graph and Twitter Card meta tags to control how your pages look when shared.
Use Tool \u2192Turn any title into a clean, SEO-friendly URL slug in seconds.
Use Tool \u2192Analyze your heading hierarchy, catch missing H1s, and fix structural SEO issues.
Use Tool \u2192Count words, characters, sentences, paragraphs, and get reading time estimates instantly.
Use Tool \u2192How It Works
No account needed, no sign-up required. Completely free. Configure your rules, add your sitemap, and copy a valid robots.txt file in seconds.
Choose which bot to target: all bots (*), Googlebot, Bingbot, or type a custom bot name. You can create multiple robot.txt blocks by running the tool again with different user-agents and combining the outputs.
Add as many path rules as you need. Use Disallow to block bots from specific folders or files. Use Allow to grant access to paths within a blocked section. Leave paths blank to default to "Allow: /" which permits full access.
Copy the generated content and save it as a plain text file named exactly "robots.txt." Upload it to your website root directory so it is accessible at yourdomain.com/robots.txt. Verify access after uploading. Free, no account required.
The Syntax
A robots.txt file uses a simple directive-based syntax. Each block begins with a User-agent and is followed by Allow or Disallow rules that apply to that bot.
Robots.txt Structure
User-agent: [bot] → Allow: [path] → Disallow: [path] → Sitemap: [url]
Example: User-agent: * / Disallow: /admin/ / Sitemap: https://example.com/sitemap.xml
Each robots.txt block starts with a User-agent line that specifies which bot the rules apply to. The wildcard * targets all bots simultaneously. Named bots like Googlebot or Bingbot can have their own separate rule blocks.
Allow and Disallow directives take a URL path as their value. The path is relative to your domain root. For example, Disallow: /private/ tells the bot not to crawl any URL starting with yourdomain.com/private/.
When both Allow and Disallow rules match a URL, the most specific rule wins. This lets you block a folder like /admin/ but allow a specific public page within it, like /admin/public-stats/.
The Sitemap directive is placed at the end of the file and points to the full URL of your XML sitemap. Multiple Sitemap directives are allowed, one per line, for sites with multiple sitemaps.
Directive Reference
Use this table as a quick reference when writing or auditing a robots.txt file.
| Directive | Value | What It Does |
|---|---|---|
| User-agent | * or specific bot | Specifies which bot the following rules apply to. Use * for all bots or name a specific crawler. |
| Allow | /path/ | Explicitly permits access to a URL path even within a Disallowed section. More specific rules take precedence. |
| Disallow | /private/ | Tells the specified bot not to crawl URLs matching this path. An empty Disallow means allow everything. |
| Sitemap | Full URL | Points crawlers to your XML sitemap. Helps bots discover all your important pages faster. |
| Crawl-delay | Seconds | Asks bots to wait between requests. Googlebot ignores this. Useful for some non-Google crawlers that overload servers. |
Sources: Google Search Central, Robots Exclusion Protocol, 2026.
Bot Behavior Reference
Not all crawlers behave the same way. Use this table to understand how the most common bots respond to your robots.txt directives.
| Bot | Respects Disallow | Respects Crawl-delay | Notes |
|---|---|---|---|
| Googlebot | Yes | No | Primary Google crawler. Set rules in robots.txt and control indexing via noindex tags. |
| Google-Extended | Yes | No | Google AI training crawler. Block with User-agent: Google-Extended + Disallow: / if needed. |
| Bingbot | Yes | Yes | Microsoft Bing crawler. Respects Crawl-delay. Manageable via Bing Webmaster Tools. |
| AhrefsBot | Yes | No | SEO research crawler. Block if you want to reduce crawl footprint from third-party tools. |
| Semrushbot | Yes | No | SEMrush research crawler. Block with User-agent: SemrushBot + Disallow: /. |
| GPTBot | Yes | No | OpenAI training crawler. Block with User-agent: GPTBot + Disallow: / to opt out of training data. |
Bot behavior based on official documentation and technical SEO research, 2026.
What Kills Your Crawl Strategy
These mistakes range from catastrophic to quietly corrosive. All of them are completely preventable with the right knowledge.
Adding "Disallow: /" for all user-agents is the most catastrophic robots.txt error. It tells every search engine to stop crawling your site. Pages drop out of search results within weeks. Always double-check your rules before uploading, especially on production servers.
Always test with Google Search Console after updatingRobots.txt is a public file. Anyone can read it. Blocking a URL in robots.txt does not hide sensitive content. It actually advertises the existence of those URLs to anyone who checks your robots.txt. Use server-level authentication or access controls to protect sensitive pages.
robots.txt is public: never list sensitive pathsGoogle needs to render your pages to evaluate their content and user experience. Blocking CSS, JavaScript, or font files in robots.txt prevents Google from seeing what your page looks like. This can hurt your Core Web Vitals scores and ranking potential significantly.
Allow all CSS, JS, and font resources for renderingYour robots.txt file is one of the fastest ways to surface your sitemap to all search engines simultaneously. Skipping the Sitemap directive means crawlers have to find your sitemap through other means, which can delay indexing of new content by days or weeks.
Add Sitemap: URL to every robots.txt fileDisallow prevents crawling, not de-indexing. If a page is already in Google index and you add a Disallow rule, Google may still show it in search results. To remove an indexed page, you need a noindex meta tag or the Google Search Console URL removal tool, not robots.txt.
Use noindex to de-index, not DisallowA single syntax error can silently break your entire robots.txt file, causing bots to fall back to default behavior. After every change, test your robots.txt using Google Search Console robots.txt tester, verify the file is accessible at /robots.txt, and check for crawl errors in Search Console within 48 hours.
Test every change in Google Search ConsoleOptimize Your Crawl Strategy
A good robots.txt file protects your crawl budget and gives search engines a clear path to your best content. All CommonNinja widgets are free to start.
Third-party SEO bots, content scrapers, and AI training crawlers can consume significant server bandwidth. Add specific User-agent blocks for bots like AhrefsBot, SemrushBot, or GPTBot with Disallow: / if you do not want them crawling your site.
Paths like /admin/, /checkout/, /cart/, /login/, and /account/ should always be blocked for all bots. These pages add no SEO value, waste crawl budget, and expose internal functionality. Block them globally with User-agent: * and Disallow rules.
Accordion FAQ sections pack keyword-rich content into collapsible panels that search engines can fully index when JavaScript rendering is allowed. Pair open crawl rules for your main content with accordion widgets to build topical authority.
Try Accordion widget →Tab widgets keep multiple content sections on a single URL that crawlers can index as one page. This reduces URL sprawl, keeps your crawl budget focused, and gives search engines more content per URL without creating unnecessary duplicate pages.
Try Tabs widget →E-commerce filter parameters like ?sort=price, ?color=red, and ?page=2 often create thousands of near-duplicate URLs. Block these in robots.txt with Disallow: /*?* or handle them with canonical tags to prevent crawl budget waste on low-value parameterized pages.
Comparison table widgets create structured, keyword-rich pages that earn featured snippets and high-intent organic traffic. Make sure comparison page URLs are in your allowed crawl paths and included in your sitemap for fast indexing.
Try Comparison Tables widget →Content feed widgets generate new pages regularly. Ensure feed URLs follow a consistent pattern that your robots.txt allows. Add the feed index pages to your sitemap so search engines discover new content as it is published, not weeks later.
Try Feeds widget →Site redesigns, CMS migrations, and new feature launches often introduce new URL structures. After any major change, review your robots.txt to ensure new paths are correctly allowed or blocked. A stale robots.txt that blocks new pages kills rankings before they ever start.
Technical SEO Glossary
Robots.txt is part of a broader technical SEO ecosystem. Here is how the key concepts relate and when each one matters.
| Term | Definition | Format / Syntax | When to Use |
|---|---|---|---|
| robots.txt | A plain text file at your site root that communicates crawl permissions to bots. It follows the Robots Exclusion Protocol. Every website should have one. | yourdomain.com/robots.txt | Controlling bot crawl access across your entire site |
| Crawl Budget | The number of pages Googlebot will crawl on your site within a given timeframe. Sites with large crawl budgets get new content indexed faster. Wasted crawl budget on low-value pages delays indexing of important content. | Crawl capacity x Crawl demand | Optimizing which pages get crawled on large or complex sites |
| noindex | An HTML meta tag or HTTP header that tells search engines not to include a page in their index. Unlike Disallow, noindex lets bots crawl the page but prevents it from appearing in search results. | <meta name="robots" content="noindex"> | Removing specific pages from search results while allowing crawling |
| XML Sitemap | An XML file listing all the important URLs on your site. It helps search engines discover content faster, especially for large sites or pages with few internal links. | yourdomain.com/sitemap.xml | Ensuring all important pages are discoverable and crawled regularly |
| Crawl Depth | The number of clicks from the homepage required to reach a given page. Pages buried at depth 4 or deeper receive less crawl frequency than pages near the homepage. | Click depth from homepage | Improving crawl equity distribution by flattening site architecture |
From the Blog
Dig deeper into robots.txt configuration, crawl budget management, and technical SEO best practices.
In this article, we are going to discuss SEO, explain why it’s important to the success of a website, and suggest ways t...
Read article →In this article, we will look at some important SEO factors to consider when building a website, for the purpose of incr...
Read article →In this article, we discuss the importance of readability in SEO, highlighting its impact on search engine rankings, use...
Read article →In this article, we explore how SEO enhances Instagram visibility, focusing on optimizing profiles with keywords, strate...
Read article →In this article, we discuss image compression's benefits for web performance and SEO, highlighting faster load times and...
Read article →Drive more traffic to your Squarespace website with proven SEO strategies that increase visibility and improve search ra...
Read article →