Skip to main content
Back to blog

robots.txt Guide: Control Search Engine Crawling

What Is robots.txt?

robots.txt is a text file at your website's root that tells search engines which pages to crawl and which to skip. Every website should have one.

Basic Template

User-agent: *

Allow: /

Disallow: /admin/

Disallow: /api/

Disallow: /private/

Sitemap: https://yoursite.com/sitemap.xml

Important Rules

Allow everything (default)

User-agent: *

Allow: /

Block a specific directory

User-agent: *

Disallow: /staging/

Block a specific bot

User-agent: AhrefsBot

Disallow: /

Common Mistakes

1. **Blocking CSS/JS files** — Don't disallow CSS or JavaScript. Google needs them to render your page.

2. **Blocking your entire site** — `Disallow: /` blocks everything. Use carefully.

3. **No Sitemap reference** — Always include `Sitemap:` directive.

4. **Using robots.txt for security** — robots.txt is public. Don't hide sensitive URLs here — use authentication instead.

Testing

Run your URL through [SEO Snapshot](/) — we check if robots.txt exists, count disallow rules, verify sitemap reference, and warn if your page is blocked.

Check your site's SEO score for free

Analyze your site