Technical SEO
llms.txt and AI Crawlers for SaaS Sites
A practical guide to llms.txt, AI crawler access, robots.txt policy, and the public pages SaaS teams should make easy for answer engines to understand.
Technical AEO starts with a simple question: can answer engines reach and understand the pages that explain your product?
For SaaS sites, the answer depends on four files and signals:
robots.txtsitemap.xml- canonical tags
/llms.txt
Each one has a different job. Mixing them up leads to bad SEO decisions.
What llms.txt Is For
llms.txt is a proposed Markdown file placed at /llms.txt. The original llms.txt proposal describes it as a way to provide information that helps LLMs use a website at inference time.
In practical terms, it is a curated map. It tells AI systems which pages best explain the company, product, use cases, documentation, pricing, and important guides.
It should not contain secrets. It should not include private app routes. It should not replace your sitemap.
A SaaS /llms.txt file can include:
- Company and product summary.
- Homepage.
- Use-case pages.
- Pricing page.
- Product documentation or help center.
- Comparison pages.
- Core educational guides.
- Sitemap link.
- Support or sales contact page.
For AEO Table, that means pages such as AI search monitoring, AI citation tracking, AEO Table vs manual AI visibility tracking, and the AI search visibility baseline.
What robots.txt Is For
robots.txt is an access policy for crawlers that choose to follow it.
Use it to say which paths crawlers may or may not fetch. For SaaS sites, the common pattern is:
- Allow public marketing pages.
- Allow public educational content.
- Allow public documentation if it supports discoverability.
- Disallow app routes, account pages, onboarding flows, private reports, API routes, and auth pages.
OpenAI documents crawler user agents such as OAI-SearchBot and GPTBot in its crawler overview. Perplexity documents PerplexityBot and recommends allowing it if you want pages to appear in Perplexity search results in its crawler docs.
The policy choice is yours. The important part is to make it intentional.
What A Good SaaS Policy Looks Like
For most B2B SaaS sites that want AI search visibility, a reasonable policy is:
- Public marketing pages: allowed.
- Public blog and guides: allowed.
- Public use cases and comparison pages: allowed.
- Pricing, terms, privacy, support: allowed.
- Dashboard, auth, onboarding, API, private reports: disallowed.
- AI search crawlers: allowed for public resources unless legal, licensing, or infrastructure policy says otherwise.
This gives crawlers access to source pages without exposing app surfaces.
Where Canonicals Fit
Canonical tags tell search systems which URL should represent a page.
If /llms.txt, sitemap, internal links, and canonical tags point to different versions of a URL, you create unnecessary ambiguity. Use one canonical URL pattern.
For example:
- Sitemap:
https://www.aeotable.com/en/blog/ai-search-monitoring - Canonical:
https://www.aeotable.com/en/blog/ai-search-monitoring - Internal links:
/en/blog/ai-search-monitoring - llms.txt:
https://www.aeotable.com/en/blog/ai-search-monitoring
That consistency matters more than adding many crawler-specific tricks.
Where Structured Data Fits
Structured data helps search engines understand page content. Google's structured data introduction recommends using supported formats such as JSON-LD where appropriate.
For SaaS public pages, common types include:
SoftwareApplicationfor the homepage or product page.Articlefor blog posts.FAQPagewhere visible FAQ content exists and the page meets Google's FAQ structured data guidance.BreadcrumbListfor article hierarchy.
Schema is not a substitute for useful content. It is a clarity layer.
Pages To Include In llms.txt
Start small. Do not list every URL.
Recommended sections:
Product
- Homepage.
- Pricing.
- Core use cases.
- Security or trust page if available.
Use Cases
- AI search monitoring.
- ChatGPT brand monitoring.
- Competitor AI visibility.
- AI citation tracking.
Guides
- Answer Engine Optimization guide.
- AI search visibility baseline.
- AI search query set.
- AI visibility score.
- AI search visibility audit checklist.
Comparison
- Manual AI visibility tracking comparison.
- Competitor or alternative pages where they exist.
Technical Context
- Sitemap.
- Robots policy.
- Support contact.
Technical Checklist
Use this before publishing /llms.txt:
- The file is available at
/llms.txt. - It is plain text or Markdown.
- Links use canonical absolute URLs.
- Listed pages return 200.
- Listed pages are not blocked by robots.
- Listed pages appear in sitemap when they are indexable.
- Private app routes are not listed.
- The file is updated when major public content changes.
Common Mistakes
Do not treat /llms.txt as an ad page. Keep it factual.
Do not list pages that are blocked in robots.txt.
Do not include every blog post. List the pages that explain the product, category, and most important use cases.
Do not promise crawler behavior you cannot control. llms.txt is a proposal and a helpful convention, not a guaranteed inclusion mechanism.
A Simple llms.txt Draft
Here is a practical starting structure for a SaaS site:
```markdown # AEO Table
AEO Table helps teams monitor how AI answers mention, cite, and compare their brand across ChatGPT, Google AI, Perplexity, and other answer engines.
Product
- Homepage: AI search visibility monitoring for B2B teams.
- Pricing: Plans and launch credits.
- Support: Contact and product questions.
Use Cases
- AI search monitoring: Repeatable monitoring across answer engines.
- Competitor AI visibility: Compare brand and competitor mentions.
- AI citation tracking: Review cited domains and source quality.
Guides
Technical
- Sitemap - Robots policy ```
Keep this file shorter than the site itself. It is a hand-curated index, not a second sitemap.
The Bottom Line
For SaaS teams, technical AEO is mostly clarity and access control.
Use robots.txt to manage crawler access. Use sitemap and canonical tags to keep indexable URLs consistent. Use /llms.txt to point AI systems to the public resources that best explain your product and category.
Then measure whether those pages actually show up in answers with an AI search visibility audit and a repeatable AI search monitoring workflow.
FAQ
What is llms.txt?
llms.txt is a proposed Markdown file placed at /llms.txt that summarizes important website resources for AI systems at inference time.
Does llms.txt replace robots.txt?
No. robots.txt controls crawler access policies. llms.txt is a guide to important public resources. They solve different problems and should be consistent with each other.
Should SaaS sites allow AI crawlers?
SaaS teams that want visibility in AI search should usually allow crawlers for public marketing, docs, pricing, and educational pages while keeping private app, account, report, and auth routes blocked.