Who is using llms.txt files?

gpietersz

Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    Optimising for LLMs is a thing now. People do use LLMs instead of search engines so optimising for them as well as for search makes sense.

    Most big LLMs do seem to like markdown as a format. Most HTML is not well structured as it prioritises visual over semantic structure.

    However, it is not clear that llms.txt solves the problem. maybe we should be using schema.org or other more semantic HTML, or similar instead.
     
    • Like
    Reactions: Ozzy
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    I’ve been reading reports that schema isn’t as effective as structured content. And above the fold content even better. Schema is so often badly implemented as to be worthless. Automated creation tools are often to blame.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    I have it on very qualified authority from a leading UK AI programmer, that this is where LLMs are heading.
    So if its someone in the UK and they have inside knowledge they probably work for Google Deepmind? All the other big LLMs are very much based in the US or China.

    Also, heading there, does not mean they are already significant. They are easy to add so no hurry by the sound of it?
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    I’ve been reading reports that schema isn’t as effective as structured content. And above the fold content even better. Schema is so often badly implemented as to be worthless. Automated creation tools are often to blame.
    Interesting. Is that only because its often badly structured or for other reasons as well?
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    Interesting. Is that only because its often badly structured or for other reasons as well?
    It’s the automated tools. They just don’t provide the granularity. And because they are so generic the schema is often almost identical on every page. If you look at the huge range of things and itemprops available schema rarely scratch the surface.

    But build a properly structured page with headings and helpful content and the AI bots will soon scrape and deploy. It’s no real difference to the old ‘how to’ articles that were created for the aggregators many years ago.
     
    Upvote 0
    I asked the question:

    Does Gemini look for llms.txt files?


    The short answer is yes, Gemini (and Google’s AI systems) can and do look for llms.txt files, but with a bit of a "it’s complicated" asterisk.
    While Google hasn't officially declared llms.txt a mandatory web standard like robots.txt, the technology is actively moving in that direction. Here is the breakdown of how Gemini interacts with it:

    1. Verification and "Google-Extended"​

    Google uses a specific standalone crawler called Google-Extended. This is the primary tool that determines how your site’s content is used for Google’s Gemini and Vertex AI models. While robots.txt is the main way to block this crawler, evidence from server logs and developer testing shows that Google’s systems are beginning to scan for /llms.txt at the root of domains to help interpret site structure.
    +1

    2. Implementation in Google’s Own Ecosystem​

    In a "lead by example" move, Google actually implemented llms.txt files on many of its own developer and documentation sites (like Firebase and Google Search Central) in late 2025 and early 2026. This is a strong signal that Gemini-powered agents are trained to recognize and prioritize these files when they find them.

    3. How Gemini Uses the File​

    When Gemini (or its browsing agents) encounters an llms.txt file, it uses it for:
    • Token Efficiency: It helps the AI find "clean" versions of your content without the clutter of headers, footers, or ads.


    • Discovery: It points Gemini to the most authoritative pages first, which can improve the accuracy of the citations Gemini provides to users.


    • Context: The file provides a high-level summary that helps the AI understand what your site is actually about before it starts reading individual pages.

    4. Is it mandatory?​

    No. Google Search still relies on its traditional crawlers and sitemaps for ranking. However, for AI-specific discovery (appearing in Gemini’s generated answers), having an llms.txt is becoming a "best practice."
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    If that is the case the schema is redundant (because it lives in the document head).
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    @fisicx thanks, that makes sense. its not something you can usefully do with a very generic approach.

    Are we talking about the same thing? Schema.org semantic markup is often added to the body.

    @Shopclicks Google (and just about everyone else I can find) seems to use llms.txt only for documentation sites or paths. That implies it is useful for certain narrow use cases.
     
    • Like
    Reactions: Nathanto
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    Itemprop can be attached to an item but it’s not part of the page schema unless you also have item scope and define the thing etc.
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    Which you can also do in the body. It just has to go in an outer element to all the elements with itemprop for that item.
    Yes you can. But the chances of anyone managing to do this are virtually zero. Apart from you and maybe one or two others everyone uses a CMS. And most of those will be using some sort of automation to generate their schema.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    Yes you can. But the chances of anyone managing to do this are virtually zero. Apart from you and maybe one or two others everyone uses a CMS. And most of those will be using some sort of automation to generate their schema.
    I do not understand what you are saying here.

    Using a general purpose CMS with no customisation and some sort of generic schema generation? In that case, I have not tried it but you said it it does not produce good results.

    If you are customising even just a template, you should have a lot of data available in the template anyway because you are displaying it. If information is in your database it should be easy to make it available in a template, if its not what sort of automation is going to get that information and from where?

    Even completely custom systems (where you would have a database and possibly other sources with all the relevant data) are not as rare as you suggest. They are not something only one or two people use.
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    Customising a theme template isn’t easy if you plan to include schema data. For starters, you would need to input all the itemprop information and you average content editor doesn’t have this as an option. Which is why most people use the automated generators.

    And as far as I can tell llms.txt doesn’t accept schema markup.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    Customising a theme template isn’t easy if you plan to include schema data. For starters, you would need to input all the itemprop information and you average content editor doesn’t have this as an option. Which is why most people use the automated generators.

    And as far as I can tell llms.txt doesn’t accept schema markup.

    If you are using an automated generator, is it worth adding schema markup at all? I would have thought its better not to do it rather than do it badly. I think this is a central question for many.

    I think we are looking at this from very different perspectives. You are looking at from the point of view of someone running a website on a CMS plus plugins platform with no or limited skills, where as I am thinking primarily from the perspective of custom customised systems. My feeling is that it is the latter that needs schema markup. The former is probably better better served in most cases just by using nice clean semantic HTML. As far as I can understand the reason for using markdown it is that complex and non-semantic HTML is harder to understand without vision (that also means its hard for screen reader users) and needs more processing to extract information.

    Yes, you need to edit the HTML to add schema markup. That does require some knowledge but is not such an obscure skill. Any developer and lots of web designers (IMO any that are actually web designers - otherwise they are graphic designers who do some web related things) can do this. A lot of sites do have customised templates. if you do any sort of backend development going on this is trivial.

    With llms.txt you would usually either point to urls that provide markdown versions of pages, or you would use content negotiation to provide a markdown version. What I do not know is how widely either of these (especially the latter) is supported. I tried the Firebase docs which only does the former.

    That said, you can use llms.txt to just provide extra information about the site and point to HTML urls. AFAIK it is generic information for LLMs so you could even say "there is schema.org markup on pages on these paths".
     
    • Like
    Reactions: UKSBD
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    This is quite a good starter article about AI and Schema:


    Schema is great for normal searches (if done properly)

    Schema has limited or no value for AI - because of how the content is tokenised.

    It's further complicated when itemprop is used multiple times within itemscope. When the itemscope is tokenised you can also lose the itemtype that defines the thing.

    A comment I've seen a number of times is: if schema/structured data is important for AI all you need to do is spam your schema and ignore the on page content.
     
    • Like
    Reactions: Shopclicks
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    Also: schema can use microdata but you don't need schema to add microdata to your html
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    @fisicx that link is based on quite a limited argument: that tokenisation does not explicitly process the structure. However 1) it may affect processing done before tokenisation and 2) words and sentence structure are also lost in tokenisation and they clearly do matter.
    A comment I've seen a number of times is: if schema/structured data is important for AI all you need to do is spam your schema and ignore the on page content.
    That would be an argument for expecting LLMs to be setup to prefer microdata markup where you add scopes and properties to visible body text to to JSON-LD which is not visible, or to makdown which may differ.


    Schema is great for normal searches (if done properly)
    So you might as well do it anyway. This is something Google has confirmed.

    On top of that there is confirmation that Microsoft's LLMs do use structured data: https://www.linkedin.com/posts/davi...ema-markup-activity-7307785448548941830-AkW8/

    I see conflicting evidence on the usefulness of llms.txt. Sites do use it. On the other hand just a few months ago crawlers were completely ignoring it: https://www.semrush.com/blog/llms-txt/ Maybe it is intended to help agents rather than the crawlers used for training.

    The more I read and think about it the more I think that the starting point should be semantic HTML with simple structures. There seems to be evidence that the crawlers have limitations: https://vercel.com/blog/the-rise-of-the-ai-crawler

    Makes me happy as I have always prefered simple semantic HTML. I hate seeing things like <div class="header">, and (even more) sites that only load content if you have Javascript enabled.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    Also: schema can use microdata but you don't need schema to add microdata to your html
    Good point, but using a known vocabulary probably makes it more likely to be understood. Maybe other vocabularies will work well too. Google seems to encourage schema.org and everyone supports it so its probably the best for SEO.

    Incidentally, Google also says that spamming with schema can result in "manual action" and it looks like they check JSON-LD against visible pace contents.
     
    • Like
    Reactions: fisicx
    Upvote 0

    BLRemovals

    Business Member
  • Business Listing
    Feb 20, 2024
    13
    15
    London
    bestlondonremovals.co.uk
    Yes, we have been using llms.txt on our site for a few months now and it is something I would recommend any business with a content-heavy site look into seriously in 2026.

    The short version: llms.txt is a plain text file you place at the root of your domain, similar in concept to robots.txt, but written specifically for large language models rather than traditional search crawlers. It gives AI systems a structured, clean summary of what your site is about, what pages exist, and how your content is organised, without them having to interpret your full HTML, navigation, scripts, and everything else that clutters a normal web page.

    Why it matters now is that a growing share of search behaviour is shifting toward AI-generated answers, Google AI Overviews, ChatGPT browsing, Perplexity, and similar tools. These systems are deciding whether to cite your business based on how well they can read and understand your content. A well-structured llms.txt makes that significantly easier.

    For a local service business in particular, it is a way to clearly state what you do, where you operate, what makes you credible (accreditations, years trading, reviews), and which pages are most relevant, all in a format that AI can parse cleanly without noise.

    Results are difficult to attribute directly since AI citation is not tracked the same way as organic clicks, but since implementing it we have seen an increase in the site appearing in AI Overview results for relevant local search queries. Hard to say definitively that llms.txt alone drove that, but it is part of a broader GEO (generative engine optimisation) approach that seems to be working.

    Worth doing. Takes about an hour to set up properly if your site structure is already clean.
     
    Upvote 0

    UKSBD

    Moderator
  • Dec 30, 2005
    13,030
    1
    2,830
    I used to manually add schema a lot in the early days, but primarily for improving the looks of results in the serps (review stars, images, videos, recipes, prices, ratings, etc.)

    I wouldn't use a generator that added it to every page though.

    Unfortunately, the manipulation and abuse basically ruined it all, plus Google got smarter about adding these things when benefiting the end user whether you had markup or not

    I view llms text files similar to sitemaps, don't like auto generated ones that guess they know my sites better than me, much prefer to manually choose what to put in them. ie my important pages/posts rather than a free for all
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,726
    8
    15,393
    Aldershot
    www.aerin.co.uk
    Takes about an hour to set up properly if your site structure is already clean.
    This is the key part. You need a well structured site with properly organised content that the AI bots can easily index/scrape and tokenise. This is just good SEO. The llms.txt can then do its thing.

    Many moons ago the code to content ratio was an important SEO signal - it has been long forgotten but now may be worth looking at again. Less data means faster pages and everyone wants a fast loading site.
     
    Upvote 0
    Results are difficult to attribute directly since AI citation is not tracked the same way as organic clicks, but since implementing it we have seen an increase in the site appearing in AI Overview results for relevant local search queries.
    Well done on the forward thinking.

    Two questions:
    (a) Was the file AI generated?
    (b) Did you manually tweak it?
     
    Upvote 0

    BLRemovals

    Business Member
  • Business Listing
    Feb 20, 2024
    13
    15
    London
    bestlondonremovals.co.uk
    Well done on the forward thinking.

    Two questions:
    (a) Was the file AI generated?
    (b) Did you manually tweak it?
    Manualy created and keep updating it every 3 months
     
    • Like
    Reactions: Shopclicks
    Upvote 0

    BLRemovals

    Business Member
  • Business Listing
    Feb 20, 2024
    13
    15
    London
    bestlondonremovals.co.uk
    The structure and format came from an llms.txt template, which I adapted to fit our business. I then used AI to help populate and refine the content based on our actual company information, services, coverage, and FAQs. The facts, awards, and details are all manually verified, and I review and update it every few months as the site evolves.

    So not purely manual, nor purely AI-generated. More of a collaborative process where the business knowledge and verification are mine, and the AI helps with structure and wording.
     
    • Like
    Reactions: gpietersz
    Upvote 0
    So not purely manual, nor purely AI-generated. More of a collaborative process where the business knowledge and verification are mine, and the AI helps with structure and wording.
    Can you share how you first heard about llms.txt files? Interested to know how the message is getting out.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,755
    2
    728
    Northwhich, Cheshire
    pietersz.net
    This is the key part. You need a well structured site with properly organised content that the AI bots can easily index/scrape and tokenise. This is just good SEO. The llms.txt can then do its thing.

    Many moons ago the code to content ratio was an important SEO signal - it has been long forgotten but now may be worth looking at again. Less data means faster pages and everyone wants a fast loading site.
    If you have a clear and clean url structure its much easy to write a crafted manual llms.txt.

    I am not sure about the ratio per se, but faster is definitely better in terms of usability for humans so what have you got to lose?
     
    Upvote 0

    BLRemovals

    Business Member
  • Business Listing
    Feb 20, 2024
    13
    15
    London
    bestlondonremovals.co.uk
    Can you share how you first heard about llms.txt files? Interested to know how the message is getting out.
    I first came across it through a video by Imran from Web Squadron on YouTube — he does very practical, no-nonsense content on technical SEO and GEO topics. The video walked through what llms.txt is, why it matters for AI visibility, and the exact steps to create and upload the file. Worth watching if anyone wants a straightforward walkthrough rather than just reading about it.
    https://www.youtube.com/watch?v=rmgudGV7U6Y
     
    • Like
    Reactions: Shopclicks
    Upvote 0

    Latest Articles