Blog

Technical SEO

llms.txt and AI Crawlers for SaaS Sites

A practical guide to llms.txt, AI crawler access, robots.txt policy, and the public pages SaaS teams should make easy for answer engines to understand.

AEO TableJune 15, 2026

Technical AEO starts with a simple question: can answer engines reach and understand the pages that explain your product?

For SaaS sites, the answer depends on four files and signals:

  • robots.txt
  • sitemap.xml
  • canonical tags
  • /llms.txt

Each one has a different job. Mixing them up leads to bad SEO decisions.

What llms.txt Is For

llms.txt is a proposed Markdown file placed at /llms.txt. The original llms.txt proposal describes it as a way to provide information that helps LLMs use a website at inference time.

In practical terms, it is a curated map. It tells AI systems which pages best explain the company, product, use cases, documentation, pricing, and important guides.

It should not contain secrets. It should not include private app routes. It should not replace your sitemap.

A SaaS /llms.txt file can include:

  • Company and product summary.
  • Homepage.
  • Use-case pages.
  • Pricing page.
  • Product documentation or help center.
  • Comparison pages.
  • Core educational guides.
  • Sitemap link.
  • Support or sales contact page.

For AEO Table, that means pages such as AI search monitoring, AI citation tracking, AEO Table vs manual AI visibility tracking, and the AI search visibility baseline.

What robots.txt Is For

robots.txt is an access policy for crawlers that choose to follow it.

Use it to say which paths crawlers may or may not fetch. For SaaS sites, the common pattern is:

  • Allow public marketing pages.
  • Allow public educational content.
  • Allow public documentation if it supports discoverability.
  • Disallow app routes, account pages, onboarding flows, private reports, API routes, and auth pages.

OpenAI documents crawler user agents such as OAI-SearchBot and GPTBot in its crawler overview. Perplexity documents PerplexityBot and recommends allowing it if you want pages to appear in Perplexity search results in its crawler docs.

The policy choice is yours. The important part is to make it intentional.

What A Good SaaS Policy Looks Like

For most B2B SaaS sites that want AI search visibility, a reasonable policy is:

  • Public marketing pages: allowed.
  • Public blog and guides: allowed.
  • Public use cases and comparison pages: allowed.
  • Pricing, terms, privacy, support: allowed.
  • Dashboard, auth, onboarding, API, private reports: disallowed.
  • AI search crawlers: allowed for public resources unless legal, licensing, or infrastructure policy says otherwise.

This gives crawlers access to source pages without exposing app surfaces.

Where Canonicals Fit

Canonical tags tell search systems which URL should represent a page.

If /llms.txt, sitemap, internal links, and canonical tags point to different versions of a URL, you create unnecessary ambiguity. Use one canonical URL pattern.

For example:

  • Sitemap: https://www.aeotable.com/en/blog/ai-search-monitoring
  • Canonical: https://www.aeotable.com/en/blog/ai-search-monitoring
  • Internal links: /en/blog/ai-search-monitoring
  • llms.txt: https://www.aeotable.com/en/blog/ai-search-monitoring

That consistency matters more than adding many crawler-specific tricks.

Where Structured Data Fits

Structured data helps search engines understand page content. Google's structured data introduction recommends using supported formats such as JSON-LD where appropriate.

For SaaS public pages, common types include:

  • SoftwareApplication for the homepage or product page.
  • Article for blog posts.
  • FAQPage where visible FAQ content exists and the page meets Google's FAQ structured data guidance.
  • BreadcrumbList for article hierarchy.

Schema is not a substitute for useful content. It is a clarity layer.

Pages To Include In llms.txt

Start small. Do not list every URL.

Recommended sections:

Product

  • Homepage.
  • Pricing.
  • Core use cases.
  • Security or trust page if available.

Use Cases

  • AI search monitoring.
  • ChatGPT brand monitoring.
  • Competitor AI visibility.
  • AI citation tracking.

Guides

  • Answer Engine Optimization guide.
  • AI search visibility baseline.
  • AI search query set.
  • AI visibility score.
  • AI search visibility audit checklist.

Comparison

  • Manual AI visibility tracking comparison.
  • Competitor or alternative pages where they exist.

Technical Context

  • Sitemap.
  • Robots policy.
  • Support contact.

Technical Checklist

Use this before publishing /llms.txt:

  • The file is available at /llms.txt.
  • It is plain text or Markdown.
  • Links use canonical absolute URLs.
  • Listed pages return 200.
  • Listed pages are not blocked by robots.
  • Listed pages appear in sitemap when they are indexable.
  • Private app routes are not listed.
  • The file is updated when major public content changes.

Common Mistakes

Do not treat /llms.txt as an ad page. Keep it factual.

Do not list pages that are blocked in robots.txt.

Do not include every blog post. List the pages that explain the product, category, and most important use cases.

Do not promise crawler behavior you cannot control. llms.txt is a proposal and a helpful convention, not a guaranteed inclusion mechanism.

A Simple llms.txt Draft

Here is a practical starting structure for a SaaS site:

```markdown # AEO Table

AEO Table helps teams monitor how AI answers mention, cite, and compare their brand across ChatGPT, Google AI, Perplexity, and other answer engines.

Product

  • Homepage: AI search visibility monitoring for B2B teams.
  • Pricing: Plans and launch credits.
  • Support: Contact and product questions.

Use Cases

Guides

Technical

- Sitemap - Robots policy ```

Keep this file shorter than the site itself. It is a hand-curated index, not a second sitemap.

The Bottom Line

For SaaS teams, technical AEO is mostly clarity and access control.

Use robots.txt to manage crawler access. Use sitemap and canonical tags to keep indexable URLs consistent. Use /llms.txt to point AI systems to the public resources that best explain your product and category.

Then measure whether those pages actually show up in answers with an AI search visibility audit and a repeatable AI search monitoring workflow.

FAQ

What is llms.txt?

llms.txt is a proposed Markdown file placed at /llms.txt that summarizes important website resources for AI systems at inference time.

Does llms.txt replace robots.txt?

No. robots.txt controls crawler access policies. llms.txt is a guide to important public resources. They solve different problems and should be consistent with each other.

Should SaaS sites allow AI crawlers?

SaaS teams that want visibility in AI search should usually allow crawlers for public marketing, docs, pricing, and educational pages while keeping private app, account, report, and auth routes blocked.