AI SEO

LLMs.txt: The New Robots.txt for AI - Complete Setup Guide

LLMs.txt is an emerging standard that tells AI crawlers what to index on your site - analogous to robots.txt but built for large language models. This complete guide covers what it is, why it matters, and how to set it up today.

Devanshu
6 min read
Featured image for LLMs.txt: The New Robots.txt for AI - Complete Setup Guide

Just as robots.txt became essential the moment search engines started crawling the web, LLMs.txt is becoming essential as AI crawlers index the web for large language model training and real-time retrieval. It is a simple Markdown-formatted text file placed at your domain root that explicitly guides AI systems to your best content and away from content you don't want indexed.

What Is LLMs.txt?

LLMs.txt is an open standard proposed by Answer.AI (Jeremy Howard et al.) in 2024 and rapidly adopted by AI-aware organizations. The file lives at yoursite.com/llms.txt and contains:

  • A brief description of your site and its purpose

  • A curated list of your most important URLs with one-line descriptions

  • Optional sections organizing content by type or topic

  • An optional llms-full.txt link for systems that want to ingest complete page content

Unlike robots.txt (which uses allow/disallow directives), LLMs.txt is a positive guide - it tells AI systems what to prioritize, not just what to avoid.

Why LLMs.txt Matters for AEO

AI crawlers face the same challenge traditional crawlers did: a site may have thousands of pages but only a fraction represent high-quality, authoritative content. Without guidance, AI crawlers must infer quality from signals that may not capture your best work. LLMs.txt solves this by letting you explicitly signal which pages are your canonical, authoritative content.

Websites with LLMs.txt files report faster AI crawl coverage of their priority pages compared to sites without one, as AI crawlers can immediately identify and prioritize the linked content.

The file also helps prevent AI engines from indexing low-quality legacy content, draft pages, or thin content that could dilute your overall authority signal.

LLMs.txt File Format

The file uses Markdown syntax. Here is a minimal working example:


# AI Rank Lab

> AI Rank Lab is a SaaS platform for optimizing website visibility in
> AI search engines (ChatGPT, Gemini, Perplexity, Claude) through AEO
> and GEO analysis, content optimization, and citation tracking.

## Core Documentation
- [AEO Complete Guide](/blog/what-is-aeo-complete-guide-answer-engine-optimization-2026): Comprehensive guide to Answer Engine Optimization
- [GEO vs SEO 2026](/blog/geo-vs-seo-2026-what-changed-what-stays-what-matters): Comparison of GEO and SEO strategies

## Product Documentation
- [Dashboard Overview](/docs/dashboard): How to use the AI Rank Lab dashboard
- [Citation Tracking](/docs/citation-tracking): Setting up citation monitoring

## Optional: Full Content
llms-full.txt: https://airanklab.com/llms-full.txt

Step-by-Step Setup

  1. Create the file: Create a plain text file named llms.txt using the Markdown format above

  2. Select your URLs: Choose your 10–30 most authoritative and representative pages - pillar posts, key product pages, core documentation

  3. Write descriptions: Add a concise, accurate one-line description for each URL explaining what the page covers

  4. Upload to root: Place the file at yoursite.com/llms.txt (not a subdirectory)

  5. Verify access: Visit yoursite.com/llms.txt in a browser to confirm it's publicly accessible with Content-Type: text/plain

  6. Keep it updated: Update the file whenever you publish new high-priority content or retire old content

Platform Support

AI System

LLMs.txt Support

Status

ChatGPT (OpenAI)

GPTBot respects it

Supported

Claude (Anthropic)

ClaudeBot reads it

Supported

Perplexity

PerplexityBot reads it

Supported

Google Gemini

Google-Extended partially

Partial

Meta AI

Not confirmed

Emerging

LLMs.txt vs robots.txt vs sitemap.xml: The Full Picture

File

Purpose

Format

Primary Audience

robots.txt

Block/allow crawler access

Plain text directives

All crawlers

sitemap.xml

List all indexable URLs for comprehensiveness

XML

Search engine crawlers

llms.txt

Curate and prioritize best content for AI

Markdown

AI language model crawlers

llms-full.txt

Full text content of key pages for direct AI ingestion

Markdown with full content

AI systems needing full context

Advanced LLMs.txt Strategies

Section Organization for Large Sites

For sites with more than 50 high-quality pages, organize your LLMs.txt into thematic sections. AI crawlers process the file top-to-bottom - place your most authoritative content first:


# Your Site Name

> Brief site description (1–2 sentences max)

## Most Important Content
- [URL 1](link): Description of most authoritative page
- [URL 2](link): Description of second most important page

## By Topic: Category A
- [URL 3](link): Description
- [URL 4](link): Description

## By Topic: Category B
- [URL 5](link): Description

The llms-full.txt Companion File

The llms-full.txt file is an optional companion that contains the full Markdown text of your key pages. This allows AI systems to ingest your complete content without making individual page requests. It is most valuable for:

  • Documentation sites where AI tools like Cursor and GitHub Copilot query your docs

  • Sites with React/Next.js or JavaScript-heavy pages that may be challenging for AI crawlers to render

  • High-priority content that you want AI systems to have in complete context

Known AI Crawlers and Their LLMs.txt Behavior

Crawler

User Agent

LLMs.txt Support

robots.txt Respected

GPTBot (OpenAI)

GPTBot

Yes

Yes

ClaudeBot (Anthropic)

ClaudeBot

Yes

Yes

PerplexityBot

PerplexityBot

Yes

Yes

Google-Extended

Google-Extended

Partial

Yes

OAI-SearchBot (ChatGPT Search)

OAI-SearchBot

Yes

Yes

Applebot-Extended

Applebot-Extended

Emerging

Yes

Measuring LLMs.txt Effectiveness

To measure whether your LLMs.txt is working:

  1. Server log analysis: Check for requests to /llms.txt from AI crawler user agents (GPTBot, ClaudeBot, PerplexityBot)

  2. Subsequent page visits: After LLMs.txt requests, check if the listed URLs receive crawler visits - this indicates the file is being followed

  3. Citation rate change: Track AI citation rates for your LLMs.txt-listed pages before and after deployment; expect 4–8 week lag for effects to appear

  4. AI Rank Lab monitoring: Use AI Rank Lab's crawler activity dashboard to visualize which AI crawlers are accessing which pages

Common LLMs.txt Mistakes

  • Placing it in a subdirectory: The file must be at yoursite.com/llms.txt, not yoursite.com/docs/llms.txt

  • Including low-quality content: Only list your best pages - including thin or outdated content dilutes the signal quality

  • Never updating it: LLMs.txt should be updated when you publish new high-priority content or deprecate old content

  • Wrong Content-Type: Ensure the file is served as text/plain, not text/html

  • Blocking LLMs.txt in robots.txt: Accidentally blocking AI crawlers from reading the LLMs.txt file itself defeats its purpose

Key Takeaways

  • LLMs.txt was proposed by Jeremy Howard of Answer.AI in 2024 and is now supported by GPTBot, ClaudeBot, and PerplexityBot

  • Unlike robots.txt (restrict access) or sitemap.xml (list all pages), LLMs.txt is a curated positive guide for AI systems

  • Setup takes under 30 minutes; maintenance requires occasional updates when publishing new priority content

  • Measure effectiveness by monitoring AI crawler server logs and tracking citation rate changes for listed URLs

  • The llms-full.txt companion file is especially valuable for documentation sites and JavaScript-heavy pages

For a detailed implementation tutorial, see our step-by-step LLMs.txt creation guide. Manage and monitor your LLMs.txt impact with AI Rank Lab.

Frequently Asked Questions

What is LLMs.txt?
LLMs.txt is a Markdown-formatted text file placed at your domain root (yoursite.com/llms.txt) that guides AI crawlers to your most important and authoritative content. It is analogous to robots.txt but designed to provide positive direction to AI systems rather than access restrictions.
Does LLMs.txt actually improve AI citations?
Evidence from early adopters suggests LLMs.txt helps AI crawlers discover and prioritize high-quality content faster. It's particularly effective for newer sites or sites with large volumes of mixed-quality content where AI crawlers might struggle to identify the best pages independently.
Is LLMs.txt an official standard?
LLMs.txt is an open community standard proposed by Answer.AI, not an official specification from a standards body. However, it has received broad adoption signals from AI companies including Anthropic and OpenAI, and is rapidly becoming a de facto best practice for AI-aware websites.
What should I include in my LLMs.txt file?
Include your 10–30 most authoritative and representative pages: pillar blog posts, key product/service pages, core documentation, and any pages with original research. Each URL should have a concise one-line description. Avoid including low-quality, thin, or outdated content.
Where exactly should I place the LLMs.txt file?
Place it at your domain root: yoursite.com/llms.txt. It must not be in a subdirectory. Ensure it is publicly accessible (no authentication required), returned with Content-Type: text/plain, and readable by all bots not blocked by your robots.txt.
How is LLMs.txt different from a sitemap?
A sitemap lists all your pages for crawling comprehensiveness. LLMs.txt is a curated selection of your best pages with context about each one - it's about quality and prioritization, not coverage. Think of a sitemap as a complete inventory and LLMs.txt as your curated highlights reel for AI systems.

Written by

Devanshu

AI Search Optimization Expert

Enjoyed this article?

Subscribe to our newsletter and get the latest AI search optimization insights delivered to your inbox.

No spam, unsubscribe at any time. We respect your privacy.