llms.txt Explained: Evidence, Hype, and What to Do

llms.txt promises AI will understand your site better. The public evidence says otherwise: today no major LLM confirms reading it. We separate the original idea from the hype and show what to do instead.

What llms.txt Is and What It Promises

The proposal started in September 2024 with clear authorship. According to Answer.AI, Jeremy Howard proposed "adding a /llms.txt file to websites that are designed for reading by language models, not just humans."

The format is deliberately simple. According to the official specification, it is a plain Markdown file at the domain root that lists curated links to the most relevant pages. The goal: let a model assemble context without crawling the entire site.

The problem it tries to solve is real. Context windows are finite, tokens are limited, and turning HTML full of navigation and ads into clean text is imprecise and costly. Howard's proposal has merit as an idea — the debate is not about it, but about how it has been adopted since.

The proposal goes beyond the root file. According to Answer.AI, it also suggests serving a Markdown version of each page at the same URL with an .md suffix. The intent is consistent: give the model clean text without the noise of markup.

Subscribe to the Madbotz newsletter to get the next analysis straight to your inbox. No spam, no noise — just new posts.

Adoption Status in 2026

It helps to separate two things: how many sites host a /llms.txt, and how many bots officially respect it. The second is what decides the return on time invested, and there the official documentation speaks loudly by omission.

OpenAI documents three agents with distinct purposes. According to OpenAI, it uses "OAI-SearchBot and GPTBot robots.txt tags to enable webmasters to manage how their sites and content work with AI" — training, ChatGPT search, and user-directed access. The page never mentions llms.txt.

Anthropic describes an equivalent scheme. According to Anthropic's help center, its bots "respect 'do not crawl' signals by honoring industry standard directives in robots.txt," split across ClaudeBot, Claude-User, and Claude-SearchBot. llms.txt does not appear either.

Google follows the same line. According to Google Search Central, the control for AI uses is the Google-Extended token inside robots.txt. None of the three majors document support for llms.txt.

The pattern is telling. All three providers chose robots.txt as the control point and published named user-agents for each purpose. If llms.txt were the expected way to talk to their models, you would expect to see it on these very pages — and it is not there.

File	What it does	Do major LLMs use it today?	What to prioritize
robots.txt	Controls which bots crawl and for what (training, search, access)	Yes — documented by OpenAI, Anthropic, and Google	Configure it well first
sitemap.xml	Lists every site URL for discovery and indexing	Yes — long-standing discovery standard	Keep it updated
llms.txt	Curates priority content in Markdown so a model can assemble context	No official confirmation from any major engine	Optional and measured, not a priority

The Evidence: Does Any LLM Read It Today?

This post's finding is uncomfortable, but it needs to be stated plainly: there is no public evidence that a major LLM reads llms.txt in production. This is not about social-media rumors — it is about statements and logs.

The most direct quote comes from Google itself. According to Search Engine Journal, John Mueller wrote: "AFAIK none of the AI services have said they're using LLMs.TXT (and you can tell when you look at your server logs that they don't even check for it)."

Logs confirm it at scale. In that same coverage, according to Search Engine Journal, an operator hosting more than 20,000 domains reported that no relevant bots download these files, only niche user-agents. When a proposal has been public for over a year and servers log no consumption, the burden of proof shifts.

It helps to define what would count as valid evidence. Two things: an official support announcement from a major engine, or server logs showing its known user-agents requesting /llms.txt. Today neither exists publicly.

The llms.txt Cargo Cult

Here is the real problem, and it is not Howard's proposal — it is the pattern of uncritical adoption. Implementing a signal because it "sounds like AI," without verifying anyone reads it, is ritual, not strategy.

The official comparison is harsh. According to Search Engine Journal, Mueller likened llms.txt to the old keywords meta tag: "this is what a site-owner claims their site is about ... Is the site really like that? Well, you can check it. At that point, why not just check the site directly?"

The anti-patterns we see operating repeat themselves. Copying a /llms.txt with no real curation. Duplicating sitemap.xml in another format. Assuming more signals equal more visibility. Claiming crawling gains without a single before-and-after metric.

None of those four habits has a verifiable mechanism behind it. If you cannot name which engine will read the file or which number will move, you are decorating, not optimizing.

There is also an integrity risk. According to Search Engine Journal, nothing stops someone from showing one set of content in llms.txt and a different one to users and search engines — cloaking for LLMs. That possibility gives any engine one more reason not to trust the file.

What Actually Works: robots.txt as the Canonical Standard

The good news is that the standard engines do respect already exists and is documented by every provider. Nothing needs inventing — it needs configuring well.

robots.txt is the real control point. According to OpenAI, it lets "a webmaster allow OAI-SearchBot in order to appear in search results while disallowing GPTBot" so content is not used for training. Each decision is independent and verifiable.

The same principle applies to Anthropic and Google. According to Anthropic's help center, one directive per user-agent in robots.txt is enough to manage ClaudeBot, Claude-User, and Claude-SearchBot. That is the lever with measurable effect.

And the effect on citation is explicit. According to OpenAI, sites opted out of OAI-SearchBot "will not be shown in ChatGPT search answers." Allowing the right bot in robots.txt directly influences whether you can be cited — something llms.txt, as of today, does not offer.

When llms.txt Can Make Sense

There are narrow scenarios where manual curation adds value. According to Answer.AI, the original use case is software libraries, where the file gives "a structured overview of documentation, making it easier for LLMs to locate specific features or usage examples."

That family includes doc-heavy sites with a /docs structure, open-source projects, and technical knowledge bases. There the curation is real work, not cosmetic — and it is best treated as an experiment with a measurable metric, not a guarantee.

Even in those cases, the upside is convenience for a known consumer, such as a coding assistant pointed at your docs, not a ranking or citation signal across the major engines. Keep that expectation calibrated before you invest.

When It Does Not Make Sense

For most sites, it does not. A standard corporate blog or marketing site gains no measurable return by declaring a /llms.txt that duplicates what the sitemap already exposes.

If your content is already crawlable and your sitemap is current, llms.txt is an effort with no demonstrable return. Your team's time pays off more on signals the evidence does support.

Opportunity cost is the underlying argument. Every hour spent curating a file no one reads is an hour not spent on structured data, authorship, or speed — levers with documented effect. For a CMO (Chief Marketing Officer) defending a budget, that difference matters.

Where to Invest If You Want AI to Cite You

Conceptually, llms.txt falls under the idea of being findable, the first principle of the Searchability framework we documented as a brand piece. But within that principle, the evidence favors other, more mundane and proven tactics.

Those tactics are well known: clean canonical URLs, a well-configured robots.txt, an updated sitemap.xml, and structured data. According to schema.org, its vocabulary "helps search engines and other applications understand the content" — exactly the readability llms.txt promises without the adoption it lacks.

And if your underlying goal is to be cited by AI, authority signals weigh more than any declarative file. We develop this in E-E-A-T for answer engines: models choose sources by experience, authority, and trust, not by what a site claims about itself in /llms.txt.

In practice, this means treating AI visibility as a measurable discipline, not a list of files to copy. First the standards engines respect; then narrow experiments, only if you have spare capacity.

The Contrarian Checklist

In priority order, here is what to do instead of — or before — implementing llms.txt:

Configure robots.txt for the AI bots you want to cite you (GPTBot, OAI-SearchBot, ClaudeBot, Google-Extended).
Keep an updated sitemap.xml submitted in Search Console.
Implement schema.org structured data (Organization, Article, FAQPage).
Use clean, consistent canonical URLs with a trailing slash.
Invest in E-E-A-T signals: real authorship, external citations, demonstrable authority.
Measure the before and after with real data; never assume the impact.
If you still implement llms.txt, curate it for real and treat it as a measured experiment.

The free Visibility analyzer checks robots.txt, sitemap, and AI bot policy among its 131 check items — a concrete starting point for the first four steps.

Frequently Asked Questions

Do ChatGPT or OpenAI read llms.txt today?

There is no official confirmation. OpenAI's crawler documentation describes GPTBot, OAI-SearchBot, and ChatGPT-User, all controlled through robots.txt, and never mentions llms.txt as a file its agents check.

Do Claude or Anthropic read llms.txt today?

Also no confirmation. Anthropic's documentation describes ClaudeBot, Claude-User, and Claude-SearchBot, and states they respect standard robots.txt directives. It does not cite llms.txt as a signal its bots use.

Is it the same as robots.txt or sitemap.xml?

No. robots.txt controls which bots may crawl and sitemap.xml lists all your URLs for indexing. llms.txt is a different proposal: curating Markdown content so a model can assemble context. Only robots.txt and sitemap.xml are standards major engines respect today.

Should I implement llms.txt anyway, just in case?

If you have spare time and treat it as a measured experiment, it does no harm. But it does not replace robots.txt, sitemap.xml, or authority signals. Prioritize what the evidence already supports and measure the before and after.

Conclusion

The llms.txt proposal is legitimate, but the 2026 evidence is clear on three points:

No major LLM confirms reading llms.txt in production, and server logs back that up.
robots.txt, sitemap.xml, and schema.org are documented standards that engines do respect.
If you want AI to cite you, invest in measurable authority and readability, not declarative files.

Before adding another signal "just in case," check the one that actually moves the needle.

Analyze your site for free — enter a URL and get your AI Visibility Score in under 60 seconds.

AI Visibility Report