Sphinx llms.txt Generator
*************************
A `Sphinx `_ extension that generates a
summary ``llms.txt`` file, written in Markdown, and a single combined
documentation ``llms-full.txt`` file, written in reStructuredText.
`Latest PyPi Version `_
`Latest Conda Version
`_ `PyPi Downloads
per month `_ `Parallel
read/write safe <#>`_ `GitHub Repository stars
`_
Demo
====
You can see this Sphinx project’s `llms.txt
`_ and
`llms-full.txt
`_
files as a simple example.
Highlights
==========
1. **Content Collection**: Quickly gathers content from _sources,
without needing a separate build
2. **Directive Processing**: Resolves ``include`` directives by
automatically incorporating their content
3. **Path Resolution**: Transforms relative paths in directives to
full paths
4. **Output Generation**: Creates two optional files:
* ``llms.txt``: A concise summary of your documentation, in
Markdown
* ``llms-full.txt``: A comprehensive version with all
documentation content, in reStructuredText
5. **Content Filtering**: Allows you to exclude specific pages or
sections
6. **Source Code**: Allows you to include specific source code files
.. _document-getting-started:
Getting Started
---------------
Installation
~~~~~~~~~~~~
Directly install by using:
.. code:: bash
pip install sphinx-llms-txt
.. code:: bash
conda install -c conda-forge sphinx-llms-txt
Usage
~~~~~
Add the extension to your Sphinx configuration (``conf.py``):
.. code:: python
extensions = [
'sphinx_llms_txt',
]
After the HTML finishes building, **sphinx-llms-txt** will output the
location of the output files:
::
sphinx-llms-txt: Created /path/to/_build/html/llms-full.txt with 45 sources and 6879 lines
sphinx-llms-txt: created /path/to/_build/html/llms.txt
Tip: Make sure to confirm the accuracy of the output files after
installs and upgrades.
See Advanced Configuration for more information about how to use
**sphinx-llms-txt**.
.. _document-advanced-configuration:
Advanced Configuration
----------------------
This page covers advanced configuration options for the
sphinx-llms-txt extension.
.. _customizing-llms-files:
Customizing the LLMs Files
~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, the extension generates two files:
1. ``llms.txt`` - A summary file in Markdown format
2. ``llms-full.txt`` - A complete documentation file in
reStructuredText format
You can customize these files in several ways:
.. _changing-filenames:
Changing Filenames
""""""""""""""""""
You can change the default filenames by setting these values in your
``conf.py``:
.. code:: python
llms_txt_filename = "custom-summary.txt"
llms_txt_full_filename = "custom-docs.txt"
.. _disabling-file-generation:
Disabling File Generation
"""""""""""""""""""""""""
If you only want one of the files, you can disable generation of the
other:
.. code:: python
# Disable summary file
llms_txt_file = False
# Disable full documentation file
llms_txt_full_file = False
.. _custom-summary:
Adding a Custom Summary
"""""""""""""""""""""""
The summary file can include a custom description of your project:
.. code:: python
llms_txt_summary = """
This documentation explains how to use MyProject to build amazing
applications. The project provides a comprehensive API for handling
data processing and visualization.
"""
Note: The summary can span multiple lines and will be properly formatted
in the output file.
.. _custom-title:
Custom Title
""""""""""""
By default, the project name from Sphinx is used as the title in
``llms.txt``. You can override this:
.. code:: python
llms_txt_title = "My Custom Project Documentation"
.. _handling-large-documentation:
Handling Large Documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For very large documentation sets, generating the full documentation
file might exceed reasonable size limits. You can set a maximum line
count and control what happens when that limit is exceeded:
.. code:: python
llms_txt_full_max_size = 10000 # Maximum 10,000 lines
llms_txt_full_size_policy = "warn_skip" # Default behavior
The ``llms_txt_full_size_policy`` setting controls both the log level
and action taken when the size limit is exceeded. It uses the format
``"_"``:
**Log levels:** - ``warn``: Log as a warning (default) - ``info``: Log
as informational message
**Actions:** - ``skip``: Don’t create the file (default) - ``keep``:
Create the file anyway, ignoring the size limit - ``note``: Create a
placeholder file explaining why the full file wasn’t generated
Tip: Use `Excluding Content `_ to remove less
relevant pages and reduce the file size.
.. _custom-directive-handling:
Custom Directive Handling
~~~~~~~~~~~~~~~~~~~~~~~~~
.. _path-resolution:
Path Resolution
"""""""""""""""
The extension resolves paths in the common directives ``[ 'image',
'figure']`` by default. You can add custom directives to this list:
.. code:: python
llms_txt_directives = [
"my-custom-image-directive",
"another-directive-with-paths",
]
This ensures that paths in your custom directives are properly
resolved in the generated files.
.. _excluding-content:
Excluding Content
~~~~~~~~~~~~~~~~~
There are several ways to exclude content from the generated
``llms-full.txt`` file:
.. _global-exclusion:
Global Page Exclusion
"""""""""""""""""""""
You can exclude specific pages from being included in the generated
files:
.. code:: python
llms_txt_exclude = [
"search", # Exclude the search page
"genindex", # Exclude the index page
"private_*", # Exclude all pages starting with 'private_'
]
This is useful for excluding auto-generated pages, indexes, or content
that isn’t relevant for LLM consumption. It can also be used to reduce
the size of llms-full.txt.
.. _page-level-ignore:
Page-Level Ignore Metadata
""""""""""""""""""""""""""
You can exclude individual pages by adding metadata at the top of any
reStructuredText file:
.. code:: restructuredtext
:llms-txt-ignore: true
Page Title
==========
This entire page will be excluded from llms-full.txt
When this metadata is present, the entire page is skipped during
processing.
.. _block-level-ignore:
Block-Level Ignore Directives
"""""""""""""""""""""""""""""
You can exclude specific sections within a page using ignore
directives:
.. code:: restructuredtext
Page Title
==========
This content will be included in llms-full.txt.
.. llms-txt-ignore-start
This content will be excluded from llms-full.txt.
Section To Ignore
-----------------
This entire section and any nested content will be ignored.
.. code-block:: python
# This code block will also be ignored
def ignored_function():
pass
.. llms-txt-ignore-end
This content will be included again.
Block-level ignores can be useful for:
* Removing internal notes or TODOs
* Hiding implementation details while keeping user-facing
documentation
Note: * Multiple ignore blocks can be used within the same file
* Ignore directives work with any indentation level
.. _including-code-files:
Including Source Code Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can include source code files from your project at the end of
`llms_txt_full_filename <#confval-llms_txt_full_filename>`_.
Use include/exclude syntax to precisely control which files are
included:
.. code:: python
llms_txt_code_files = [
"+:src/**/*.py", # Include all Python files in src
"-:src/**/__pycache__/**", # Exclude Python cache files
]
Pattern syntax:
* **+:pattern**: Include files matching the pattern. Processed first
to collect matching files.
* **-:pattern**: Exclude files matching the pattern. Applied to
filter out unwanted files.
Code files are processed as follows:
* **Glob patterns**: Use standard glob patterns (``*``, ``**``,
``?``) to match files
* **Relative paths**: Patterns are resolved relative to your Sphinx
source directory
* **Formatting**: Each file is presented with a title and
syntax-highlighted code block
.. _customizing-code-paths:
Customizing Code File Paths
"""""""""""""""""""""""""""
By default, the extension automatically detects the relative path from
your Sphinx source directory to the git root and strips that prefix
from displayed file paths. You can customize this behavior:
.. code:: python
# Manually specify base path to strip
llms_txt_code_base_path = "../../"
# Disable path stripping entirely
llms_txt_code_base_path = ""
This helps create cleaner, more readable file paths in the generated
documentation.
.. _using-html-baseurl:
Using HTML Base URL
~~~~~~~~~~~~~~~~~~~
If you want to include absolute URLs for resources in your
documentation, you can use Sphinx’s built-in ``html_baseurl``
configuration:
.. code:: python
html_baseurl = "https://example.com/docs/"
When this option is set, all resolved paths in directives will be
prefixed with this URL, creating absolute paths in the generated
files.
.. _customizing-uri-links:
Customizing URI Links in llms.txt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, the ``llms.txt`` file links to source files in the
``_sources`` directory when available, falling back to HTML pages when
sources aren’t available. You can customize this behavior using URI
templates with `llms_txt_uri_template
<#confval-llms_txt_uri_template>`_:
.. code:: python
# Default: Link to source files, if _sources exists
llms_txt_uri_template = "{base_url}_sources/{docname}{suffix}{sourcelink_suffix}"
# Default: Link to HTML pages instead, if _sources doesn't exist
llms_txt_uri_template = "{base_url}{docname}.html"
# Manual: Link to a custom markdown build
llms_txt_uri_template = "{base_url}{docname}.md"
.. _available-template-variables:
Available Template Variables
""""""""""""""""""""""""""""
Your URI template can use the following variables:
* ``{base_url}`` - The base URL from ``html_baseurl`` configuration
(includes trailing slash)
* ``{docname}`` - The document name (e.g., ``index``,
``guide/intro``)
* ``{suffix}`` - The source file suffix (e.g., ``.rst``, ``.md``) -
may be empty if no source file exists
* ``{sourcelink_suffix}`` - The suffix from
``html_sourcelink_suffix`` configuration (e.g., ``.txt``)
Tip: Instead of using the default of linking to ``_sources``, you can
generate Markdown and/or reStructuredText files from your
documentation and link to those in ``llms.txt``. See this package’s
`CMake setup `_ for an
example of building both HTML and Markdown and/or reStructuredText
in parallel. Note that ``_sources`` is still needed for
``llms-full.txt`` at this time.
.. _integration-examples:
Integration Examples
~~~~~~~~~~~~~~~~~~~~
Complete Configuration Example
""""""""""""""""""""""""""""""
Here’s a complete example showing multiple Project Configuration
Values:
.. code:: python
# File names and generation options
llms_txt_filename = "ai-summary.txt"
llms_txt_full_filename = "ai-full-docs.txt"
llms_txt_full_max_size = 50000
llms_txt_full_size_policy = "warn_note"
# Content customization
llms_txt_title = "Project Documentation for AI Assistants"
llms_txt_summary = """
This is a comprehensive documentation set for our project.
It includes API references, usage examples, and tutorials.
"""
llms_txt_uri_template = "{base_url}{docname}.md"
# Path handling
html_baseurl = "https://docs.example.com/"
llms_txt_directives = ["custom-image", "custom-include"]
# Content filtering
llms_txt_exclude = ["search", "genindex", "404", "private_*"]
# Source code inclusion with include/exclude patterns
llms_txt_code_files = [
"+:../../src/**/*.py", # Include Python files
"+:../../config/*.yaml", # Include config files
"-:../../src/**/__pycache__/**", # Exclude cache files
]
llms_txt_code_base_path = "../../"
.. _document-configuration-values:
Project Configuration Values
----------------------------
``llms_txt_full_file``
* **Type**: boolean
* **Default**: ``True``
* **Description**: Whether to write the single output file. See
`Disabling File Generation <#disabling-file-generation>`_.
Added in version 0.1.0.
``llms_txt_full_filename``
* **Type**: string
* **Default**: ``'llms-full.txt'``
* **Description**: Name of the single output file. See `Changing
Filenames <#changing-filenames>`_.
Added in version 0.1.0.
``llms_txt_full_max_size``
* **Type**: integer or ``None``
* **Default**: ``None`` (no limit)
* **Description**: Sets a maximum line count for
``llms_txt_full_filename``. Behavior when exceeded is controlled
by `llms_txt_full_size_policy
`_. See `Handling Large
Documentation <#handling-large-documentation>`_.
Added in version 0.2.0.
``llms_txt_full_size_policy``
* **Type**: string
* **Default**: ``'warn_skip'``
* **Description**: Controls what happens when
`llms_txt_full_max_size `_ is
exceeded. Format is ``_``. Log levels:
``warn``, ``info``. Actions: ``skip``, ``keep``, ``note``. See
`Handling Large Documentation <#handling-large-documentation>`_.
Added in version 0.5.0.
``llms_txt_file``
* **Type**: boolean
* **Default**: ``True``
* **Description**: Whether to write the summary information file.
See `Disabling File Generation <#disabling-file-generation>`_.
Added in version 0.2.0.
``llms_txt_filename``
* **Type**: string
* **Default**: ``llms.txt``
* **Description**: Name of the summary information file. See
`Changing Filenames <#changing-filenames>`_.
Added in version 0.2.0.
``llms_txt_uri_template``
* **Type**: string or ``None``
* **Default**: ``None``
* **Description**: Template string for generating URIs in
``llms.txt``. See `Customizing URI Links in llms.txt
<#customizing-uri-links>`_.
Added in version 0.7.0.
``llms_txt_directives``
* **Type**: list of strings
* **Default**: ``[]`` (empty list)
* **Description**: List of custom directive names to process for
path resolution. See `Path Resolution <#path-resolution>`_.
Added in version 0.1.0.
``llms_txt_title``
* **Type**: string or ``None``
* **Default**: ``None``
* **Description**: Overrides the Sphinx project name as the
heading in ``llms.txt``. See `Custom Title <#custom-title>`_.
Added in version 0.2.0.
``llms_txt_summary``
* **Type**: string
* **Default**: The first paragraph in the root document, else an
empty string
* **Description**: Optional, but recommended, summary description
for ``llms.txt``. See `Adding a Custom Summary
<#custom-summary>`_.
Added in version 0.2.0.
``llms_txt_exclude``
* **Type**: list of strings
* **Default**: ``[]``
* **Description**: A list of pages to ignore using glob patterns.
See `Excluding Content <#excluding-content>`_.
Added in version 0.2.1.
``llms_txt_code_files``
* **Type**: list of strings
* **Default**: ``[]``
* **Description**: A list of glob patterns that appends source
code files to `llms_txt_full_filename
`_. See `Including Source Code
Files <#including-code-files>`_.
Added in version 0.4.0.
``llms_txt_code_base_path``
* **Type**: string or ``None``
* **Default**: ``None`` (auto-detect from git root)
* **Description**: Base path to strip from code file paths when
displaying titles. When ``None``, automatically detects the
relative path from the Sphinx source directory to the git root
and strips that prefix from file paths.
Added in version 0.4.0.
.. _document-contributing:
Contributing
------------
You will need to set up a development environment to make and test
your changes before submitting them.
Local development
~~~~~~~~~~~~~~~~~
1. Clone the `sphinx-llms-txt repository
`_.
2. Create and activate a virtual environment:
.. code:: console
python3 -m venv .venv
source .venv/bin/activate
3. Install development dependencies:
.. code:: console
pip install -e . --group dev
4. Install pre-commit Git hook scripts:
.. code:: console
pre-commit install
Testing changes
~~~~~~~~~~~~~~~
Run ``pytest`` before committing changes.
Current contributors
~~~~~~~~~~~~~~~~~~~~
Thanks to all who have contributed! The people that have improved the
code:
* .. image:: https://avatars.githubusercontent.com/u/3474095?v=4
`jdillard `_
.. _document-changelog:
Changelog
---------
0.7.0
~~~~~
* Add `llms_txt_uri_template <#confval-llms_txt_uri_template>`_
configuration option to control the link behavior in
`llms_txt_filename <#confval-llms_txt_filename>`_. `#48
`_
0.6.0
~~~~~
* Improve _sources directory handling `#47
`_
0.5.3
~~~~~
* Make sphinx a required dependency since there are imports from
Sphinx `#44 `_
0.5.2
~~~~~
* Remove support for singlehtml `#40
`_
0.5.1
~~~~~
* Only allow builders that have _sources directory `#38
`_
0.5.0
~~~~~
* Add `Block-Level Ignore Directives <#block-level-ignore>`_ and
`Page-Level Ignore Metadata <#page-level-ignore>`_ `#33
`_
* Add `llms_txt_full_size_policy
<#confval-llms_txt_full_size_policy>`_ configuration option to
control behavior when `llms_txt_full_max_size
<#confval-llms_txt_full_max_size>`_ is exceeded. `#35
`_
0.4.1
~~~~~
* Fix include paths and spacing `#31
`_
0.4.0
~~~~~
* Add support for including source code files with
`llms_txt_code_files <#confval-llms_txt_code_files>`_ and
`llms_txt_code_base_path <#confval-llms_txt_code_base_path>`_
configuration options `#24
`_
0.3.2
~~~~~
* Fix image paths to deployed images `#30
`_
0.3.1
~~~~~
* Fix issue when ``source_suffix`` equals ``source_link_suffix`` `#29
`_
0.3.0
~~~~~
* Use first paragraph as default for ``llms_txt_summary`` `#22
`_
0.2.4
~~~~~
* Support source file suffix detection `#21
`_
0.2.3
~~~~~
* Remove ``get_and_resolve_toctree`` method `#19
`_
* Simplify ``_sources`` lookup `#18
`_
* Add sphinx docs `#16
`_
0.2.2
~~~~~
* Refactor LLMSFullManager with clearer class structure
* Add ``html_baseurl`` to **llms.txt** docs links
* Make glob pattern recursive
0.2.1
~~~~~
* Add ability to exclude pages with ``llms_txt_exclude``
0.2.0
~~~~~
* Add ``llms_txt_full_max_size`` configuration option to limit
*llms-full.txt* file size
* Automatically add content from **include** directives in
**llms-full.txt**
* Add path resolution for a given set of directives in
**llms-full.txt**
* Add **llms.txt** file option, with ``llms_txt_title`` and
``llms_txt_summary`` config values
0.1.0
~~~~~
* Initial release