robots.txt Guide: Control Search Engine Crawling
What Is robots.txt?
robots.txt is a text file at your website's root that tells search engines which pages to crawl and which to skip. Every website should have one.
Basic Template
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xml
Important Rules
Allow everything (default)
User-agent: *
Allow: /
Block a specific directory
User-agent: *
Disallow: /staging/
Block a specific bot
User-agent: AhrefsBot
Disallow: /
Common Mistakes
1. **Blocking CSS/JS files** — Don't disallow CSS or JavaScript. Google needs them to render your page.
2. **Blocking your entire site** — `Disallow: /` blocks everything. Use carefully.
3. **No Sitemap reference** — Always include `Sitemap:` directive.
4. **Using robots.txt for security** — robots.txt is public. Don't hide sensitive URLs here — use authentication instead.
Testing
Run your URL through [SEO Snapshot](/) — we check if robots.txt exists, count disallow rules, verify sitemap reference, and warn if your page is blocked.
Check your site's SEO score for free
Analyze your site