Karpathy LLM Wiki
A pattern for personal knowledge bases in which an LLM agent incrementally builds and maintains a structured, interlinked wiki sitting between the user and the raw sources.
What it means
A Karpathy LLM Wiki is not a retrieval-augmented chat over a pile of files. It is a persistent, structured artifact that the LLM compiles once and keeps current — a directory of interlinked markdown pages organized by entity type, with cross-references, contradictions flagged, and a schema file that tells the LLM how to maintain it. The user curates sources and asks questions; the LLM does all the bookkeeping.
The pattern is deliberately abstract — it specifies a shape (three layers, three operations) and leaves implementation specifics (directory layout, frontmatter, slug rules, page formats) to be co-evolved between the user and the LLM agent for the user's domain.
How it shows up in sources
- LLM Wiki — the canonical statement of the pattern. Defines the three layers, three operations, and the principle that the wiki is "a persistent, compounding artifact."
Mechanism / how it works
The pattern has three structural elements:
- A three-layer architecture — see Raw immutable layer, Wiki curated layer, Schema as program.
- Three operations — see Ingest, query, lint operations.
- Two navigational primitives —
index.md(content-oriented catalog) andlog.md(append-only chronological journal). At moderate scale (~100 sources, hundreds of pages) these replace the need for embedding-based RAG.
The compounding behavior comes from a fourth, derived property: maintenance cost approaches zero (the LLM does the bookkeeping that historically killed human-maintained wikis), so the substrate gets structurally richer with every ingest rather than decaying. See Self-improving substrate.
Related concepts
- Raw immutable layer — first of the three layers.
- Wiki curated layer — second of the three layers.
- Schema as program — third of the three layers; the discipline mechanism.
- Ingest, query, lint operations — the operational surface.
- Self-improving substrate — the compounding property derived from near-zero maintenance cost.
- Citation discipline — the structural property that makes query answers defensible.
Related vendors / sectors
(this concept is pre-domain — it does not yet wiki-link to specific vendors or sectors. The whole point of this vault is to instantiate the pattern in the digital-asset vendor research domain; that instantiation appears across every vendor / sector / source page.)
Open questions
- At what corpus size does index-only navigation break down? Karpathy cites "moderate scale (~100 sources, ~hundreds of pages)" as the regime where this works without a search tool.
- For domains with proprietary mixed-source corpora (e.g. 51 Terminal's audit-context research), how does the pattern adapt to enforce citation-and-licensing discipline structurally? (This vault's Path A licensing posture is one answer.)