Ingest, query, lint operations
The three operations a Karpathy LLM Wiki supports against its substrate: ingest a new source, query the substrate, lint it for drift.
What it means
The wiki is not a passive document store — it is a substrate with a small operational surface. Three operations run against it:
- Ingest: read a raw source, discuss takeaways with the user, write a source page, update or create relevant entity pages, update
index.md, append tolog.md. A single ingest can touch 10–15 pages. - Query: read
index.mdfirst to scope, drill into relevant pages, follow wiki-links, return a synthesized answer with citations to specific wiki pages. Good answers can be filed back as new wiki pages so explorations compound alongside ingested sources. - Lint: health-check the substrate. Find contradictions, orphan pages, concepts mentioned in 3+ pages without their own page, stale claims (
last_updated> 90 days AND newer sources contradict). Write findings to a dated lint file; propose changes; wait for approval before applying.
These three operations are the entire surface area the user interacts with. Together with the schema (see Schema as program), they make the LLM a disciplined maintainer.
How it shows up in sources
- LLM Wiki — > "Ingest. You drop a new source into the raw collection… Query. You ask questions against the wiki… Lint. Periodically, ask the LLM to health-check the wiki."
Mechanism / how it works
Each operation has a procedural definition in the schema (see this vault's CLAUDE.md). The procedures are:
Ingest — interactive: read raw → propose 3–5 takeaways and entity classifications → wait for user's "go" → write source page + entity pages → update index + log → commit → brief report.
Query — read index → follow links → synthesize cited answer → optionally offer to file the answer as a new page. Append a log entry summarizing the query.
Lint — scan the wiki for the standard drift patterns; write findings to wiki/_lint-<YYYY-MM-DD>.md; do not auto-fix. Lint is the closest the pattern gets to an automated eval suite.
The interactivity of ingest is structural, not optional. The "discuss takeaways → go → write" gate is what keeps the substrate aligned with the user's intent and prevents the LLM from over-fragmenting (creating a concept page from every passing mention). It also surfaces calibration questions early, which feed schema iteration (see Schema as program).
Related concepts
- Karpathy LLM Wiki — parent pattern.
- Schema as program — defines the operational procedures.
- Wiki curated layer — what these operations write to.
- Citation discipline — the structural property query answers carry forward.
- Self-improving substrate — the compounding effect of running ingest repeatedly.
Related vendors / sectors
(pre-domain — these operations apply to any domain instantiation)
Open questions
- Lint is described in the gist as "primitive" — at what corpus size does it warrant a dedicated automated tool (cron + scripted checks) rather than an interactive operation invoked on demand?
- The "file good answers back as wiki pages" branch of the query operation creates a class of synthetic pages distinct from ingested-source pages. How should they be flagged in frontmatter (e.g.
synthetic: true) and treated by future lint passes?