Crawler blocked guide

What to do when your site blocks our crawler

The free sitemap and llms.txt generators need to fetch public pages and root files. If your host returns a challenge, 403, 429, or 503, use this checklist before rerunning the tool.

Allow the audit user agent

If your WAF blocks unknown bots, allow the audit user agent for public HTML, robots.txt, sitemap.xml, and llms.txt.

User-Agent: layzr.ai-agentic-audit/1.0

Bypass challenges for public files

Security challenges are useful for forms and private routes, but they often block crawler-facing files that should stay public.

/robots.txt
/sitemap.xml
/llms.txt
/llms-full.txt

Check CDN bot settings

Cloudflare, Vercel, and other edge providers can challenge automated requests before your app receives them.

Look for bot fight mode, WAF rules, rate limits, and security checkpoints.

Publish static files at the root

When crawlers cannot execute your app, static root files are the most reliable discovery path.

https://example.com/robots.txt
https://example.com/sitemap.xml
https://example.com/llms.txt

What happened

Your public page returned a bot challenge

The generators stop when they cannot safely read the site. We do not try to bypass bot protection, solve challenges, or store blocked responses.

Practical path

  1. Confirm /robots.txt, /sitemap.xml, and /llms.txt are reachable in a private browser window.
  2. Check your CDN or host security logs for requests using layzr.ai-agentic-audit/1.0.
  3. Allow public GET requests to root discovery files, then rerun the sitemap or llms.txt generator.
  4. If you cannot change WAF rules, hand-write the sitemap.xml and llms.txt files from the public URLs you already know.