Design decisions
ADR-style record of the load-bearing choices: file-first over a knowledge graph, SQLite first-class with WAL, never sync the raw database, git-as-sync now, and a swappable reflection model.
This page is the canonical record of the architecture decisions Anamnesis is built on, written in an ADR
(Architecture Decision Record) style: for each decision, what we decided, why, and the specific condition
under which we would revisit it. These are the choices stated in CLAUDE.md under "Architecture decisions
(do not relitigate without evidence)," reconciled with how they actually show up in server/src/anamnesis/.
The through-line behind every decision is one sentence: markdown is the source of truth, and everything else is either derived from it or moves it between your machines. Each decision below is a consequence of holding that line and refusing to add machinery that does not earn its keep.
The bar for changing any of these is evidence, not preference. CLAUDE.md frames them as decisions not to
relitigate "without evidence," and each one below names the concrete signal that would constitute that
evidence. Where the project plans to grow, it grows by adding behind an existing seam (for example the
SyncBackend protocol), not by rewriting the core.
The decisions at a glance
| Decision | Status | Revisit when |
|---|---|---|
| File-first, not a knowledge graph | Adopted (v1) | Recall quality on real usage demonstrably suffers |
| SQLite is first-class, with WAL | Adopted | (WAL stays; vectors are the conditional part) |
Add sqlite-vec vectors | Deferred | Keyword search measurably fails on paraphrase queries |
| Never sync the raw DB; sync markdown via git | Adopted | (Load-bearing; not expected to change) |
| Git-as-sync now | Adopted (v0) | Multi-user lands (then Turso/libSQL), or true multi-writer (then CRDTs) |
| Reflection/compression model is swappable | Adopted | (Never hardcode it; the frontier moves weekly) |
Decision 1: File-first, not a knowledge graph
Decision. Memory is plain markdown files, one note per file, and they are the single source of truth. Anamnesis does not build a knowledge-graph memory or a custom context compressor for v1.
Why. The publicly stated rationale in CLAUDE.md: "the research shows files + keyword search are
competitive and graphs add cost without proportional benefit." A graph layer is real, permanent complexity
(a schema, an ingestion pipeline, a query language, its own corruption and migration story) added on top of
something users already understand. Plain files keep the whole system inspectable and recoverable: you can
read a note in any editor, git diff it, review it in a pull request, and hand-edit it without going through
Anamnesis at all.
This shows up concretely in the store. A note is a YAML front-matter block followed by a markdown body,
serialized by _serialize and parsed back by _deserialize in server/src/anamnesis/store.py. Reads go to
the file, not the database: MemoryStore.get looks up the file path in the index and then reads and
deserializes the markdown, so the file stays canonical.
# server/src/anamnesis/store.py (MemoryStore.get)
def get(self, memory_id: str) -> Memory:
"""Read a memory back from its markdown file (the source of truth)."""
row = self._db.execute(
"SELECT body_path, scope FROM memories WHERE id = ?", (memory_id,)
).fetchone()
if row is None:
raise KeyError(memory_id)
base = self._dir_for_scope(row["scope"])
text = (base / row["body_path"]).read_text(encoding="utf-8")
return _deserialize(text)Note the column the database stores: body_path, not the body itself. The schema (_SCHEMA in store.py)
indexes structured metadata and a derived FTS copy of the text, but the authoritative body lives only on
disk. The memories table even encodes body_path TEXT NOT NULL, making "the file is where the body lives"
a structural fact, not a convention.
The General conventions in CLAUDE.md restate the same posture as a rule: "Don't introduce a database
server, a graph DB, or a vector DB 'just in case' - stay local-first and simple."
When we would revisit. CLAUDE.md is explicit: "Revisit only if recall quality on real usage demonstrably
suffers." That is a measurable bar, not a vibe. The instrument that would measure it already exists:
server/src/anamnesis/eval.py runs recall_at_k(store, cases, ks=(1, 3, 5, 8)) and reports Recall@k plus MRR
over a set of paraphrase queries. Files-plus-keyword-search stays until that number, on real usage, drops far
enough to justify the cost of a graph.
"Files + keyword search are competitive" is a claim about the current generation of techniques, recorded in
the local-only docs/research/model-landscape.md and kept fresh. The decision is held against evidence, so
the right response to "should we add a graph" is to run the eval, not to argue from intuition.
Decision 2: SQLite is first-class, with WAL
Decision. SQLite is a deliberate, first-class part of the design, not an afterthought. It provides FTS5-backed keyword and BM25 recall, and the connection runs in WAL mode so multiple concurrent Claude Code sessions do not hit file-locking conflicts.
Why. CLAUDE.md: "FTS5 for keyword/BM25 recall; use WAL mode so multiple concurrent Claude Code sessions
don't hit file-locking conflicts." In practice, several Claude Code sessions on one machine can all touch the
same store at once (mid-session MCP queries, a background sync hook, the dashboard). The classic SQLite default
(rollback journal) serializes readers against a writer hard enough to produce lock errors under that pattern.
WAL lets readers proceed concurrently with a writer, and a busy timeout absorbs the brief windows where a lock
genuinely must be waited for.
The store opens its connection with exactly these pragmas, and the comment in store.py spells out why the
connection is even shared across threads:
# server/src/anamnesis/store.py (MemoryStore.__init__)
# check_same_thread=False: the FastMCP server runs sync tools in a worker
# threadpool, so the connection is shared across threads. SQLite's
# serialized threadsafety + WAL + busy_timeout (below) keep that safe.
self._db = sqlite3.connect(self.db_path, check_same_thread=False)
self._db.row_factory = sqlite3.Row
self._db.execute("PRAGMA journal_mode=WAL")
self._db.execute("PRAGMA busy_timeout=5000")So the three concrete settings are: journal_mode=WAL, busy_timeout=5000 (5 seconds), and
check_same_thread=False. The full-text side is an FTS5 virtual table over the searchable fields:
-- _SCHEMA in server/src/anamnesis/store.py
CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts USING fts5(
id UNINDEXED, title, body, tags, tokenize='porter unicode61'
);Search ranks with BM25 (ORDER BY bm25(memories_fts), m.updated_at DESC) and the default result budget is
k=8. The recall mechanics, including the deliberate OR-of-tokens query that recovered recall from about 0%
to about 94% on the eval set, are covered in depth in Recall.
A corollary decision is what is deferred: vectors. CLAUDE.md says "Add sqlite-vec vectors only when
keyword search measurably fails on paraphrase queries." Today nothing in the package imports sqlite-vec; it
exists only as an optional vectors packaging extra and is not wired into store.py.
When we would revisit. Add sqlite-vec embeddings only when the eval harness shows keyword search
measurably failing on paraphrase queries. The threshold is empirical: recall_at_k with ks=(1, 3, 5, 8)
over LLM-generated paraphrase cases (build_eval_candidates) is the gate. Until BM25 demonstrably falls short
there, the simpler FTS5 path stands and the vector dependency stays out of the default install.
Decision 3: Never sync the raw database; sync markdown via git
Decision. The raw SQLite database file is never synced over cloud folders or any other file-mirroring mechanism. Only markdown travels (via git); each machine rebuilds its own index locally.
Why. CLAUDE.md: "Never sync the raw DB file over cloud folders (the claude-brain corruption lesson).
Sync markdown via git; rebuild the index locally on each machine." A SQLite database is binary pages plus a
write-ahead log, not a single value that merges cleanly. Mirroring that live file between machines through a
folder-sync tool interleaves partial writes and corrupts it. The project refers to this repeatedly as the
"claude-brain corruption lesson," and it is the reason the rule is absolute rather than a preference.
The guarantee is enforced by topology, not by a .gitignore entry, which is stronger. The git repository
is memory/, while the database lives one level up at the store root, physically outside the git working
tree, so git literally never sees it:
~/.anamnesis/ # store root (MemoryStore.root)
memory/ # the git repo (MemoryStore.memory_dir) -- SYNCED
<type>/<id>.md # one markdown file per note, source of truth
local/ # machine-local notes (MemoryStore.local_dir) -- NEVER SYNCED
<type>/<id>.md
index.db # derived SQLite index (WAL + FTS5) -- NEVER SYNCEDBecause the index is fully derived, it is disposable: MemoryStore.reindex walks both trees (memory/ as
portable, local/ as machine-local), reads every *.md, and rebuilds the FTS5 tables from scratch. The
sync callers run that rebuild right after every pull, so search reflects the markdown that just arrived. Even
schema upgrades exploit this: the migration path in MemoryStore.__init__ drops the derived tables, recreates
them at the current _SCHEMA_VERSION (currently 1, tracked in PRAGMA user_version), and reindexes, so
there is never a risky in-place migration of user data in the database.
Do not put index.db inside memory/, and do not configure any folder-sync tool (Dropbox, iCloud, a naive
git of the binary) to mirror it between machines. The index is derived state. Syncing it is the exact failure
mode this design exists to avoid. If an index is ever damaged or stale, delete it and reindex; the files are
authoritative and recovery is total.
When we would revisit. This one is load-bearing and not expected to change. Even the planned sync evolution
(Decision 5) keeps it intact: a Turso/libSQL embedded-replica path replicates a managed database under its own
consistency protocol; it does not mean mirroring ~/.anamnesis/index.db over a cloud folder. The markdown stays
the source of truth regardless of what carries it. See Sync for the full mechanics and the durability
tests that guard this.
Decision 4: Git-as-sync now (and the planned evolution)
Decision. The v0 sync layer is plain git over a private Tailscale mesh: commit local changes, integrate
the remote with pull --rebase, push. The deliberate plan for the future is staged: Turso/libSQL embedded
replicas when multi-user lands, and CRDTs only if true concurrent multi-writer editing ever appears.
Why. CLAUDE.md: "Sync evolution: git-as-sync for the MVP -> Turso/libSQL embedded replicas when
multi-user lands -> CRDTs only if true concurrent multi-writer editing appears." Git is simple, already
battle-tested, version-controlled, human-readable, and good enough for a single user's own fleet of machines.
For one person syncing one machine at a time, there is no concurrent-multi-writer problem to solve, so the
heavier machinery would be cost without benefit, the same logic as Decision 1.
The implementation is GitSyncBackend in server/src/anamnesis/sync.py, and crucially it sits behind a
Protocol so the mechanism can evolve without touching the server, the CLI, or the dashboard:
# server/src/anamnesis/sync.py
class SyncBackend(Protocol):
"""Pluggable sync mechanism (git-over-Tailscale today; P2P/libSQL later)."""
def init(self) -> None: ...
def sync(self) -> SyncResult: ...
def state(self) -> SyncState: ...That seam is the decision made concrete. Every consumer depends only on SyncBackend and the SyncResult /
SyncState shapes, so a future Turso/libSQL backend or a direct peer-to-peer backend can slot in as another
implementation without a rewrite. The branch is always main (the module constant _BRANCH = "main"), and
integration is rebase-only, so the shared history stays linear.
The v0 conflict policy is "surface, never silently drop": if a rebase cannot apply cleanly, the backend runs
git rebase --abort (your local edits stay exactly as they are on disk) and returns a SyncResult with
conflicted=True, pushing nothing. That is a normal return value, not an exception; a human or the dashboard
reconciles the two versions, then syncs again.
When we would revisit. Each arrow above is gated on a real condition, not a calendar:
- Move past pure git when multi-user lands (the trigger for a Turso/libSQL embedded-replica backend).
- Reach for CRDTs only if true concurrent multi-writer editing appears, which the current single-user, one-machine-at-a-time pattern does not demand.
The guiding rule from CLAUDE.md's conventions still applies: stay local-first and simple, and do not add a
database server "just in case." The current design earns its keep; anything heavier waits for evidence it is
needed. Full sync behavior, the bare-repo Tailscale topology, and the durability tests are in
Sync and the Across machines guide.
Decision 5: The reflection/compression model is swappable
Decision. The model used for session-end summarization and for reflecting episodic notes into durable ones is a swappable configuration value. It is never hardcoded; provider, model, and endpoint all come from configuration, and the default path needs no API key.
Why. CLAUDE.md: "The reflection/compression model is swappable. Never hardcode it - the price/quality
frontier moves weekly (see docs/research/model-landscape.md, local-only)." Pinning a specific model into the
code would mean shipping a new release every time the frontier moves. Instead the choice is a runtime config
knob, and the project keeps a living record of current model/technique tradeoffs locally.
The module docstring of server/src/anamnesis/llm_summarizer.py states the contract directly: it "Sends a
redacted, size-bounded transcript to any OpenAI-compatible endpoint and parses a strict-JSON summary.
Provider, model, and URL come entirely from config; nothing about any provider is hardcoded. Any failure falls
back to the deterministic heuristic so capture never breaks session teardown."
Configuration is read from the environment by resolve_reflection_config, which is machine-local and never
synced (it is reflection config, not memory):
| Variable | Default | Purpose |
|---|---|---|
ANAMNESIS_REFLECTION_PROVIDER | heuristic | provider label; anything other than a complete config falls back to the deterministic heuristic |
ANAMNESIS_REFLECTION_MODEL | "" | model id sent in the request payload |
ANAMNESIS_REFLECTION_BASE_URL | "" | OpenAI-compatible base URL (/chat/completions is appended) |
ANAMNESIS_REFLECTION_API_KEY | (falls back to DEEPSEEK_API_KEY, then OPENAI_API_KEY) | bearer token |
ANAMNESIS_REFLECTION_TIMEOUT | 30 (seconds) | HTTP timeout |
ANAMNESIS_REFLECTION_MAX_TOKENS | 120000 | drives the transcript char window (max_tokens * 4) |
make_llm_summarizer builds an LLMSummarizer only when model, base_url, and api_key are all present;
otherwise it returns the deterministic HeuristicSummarizer. So the model is not just swappable, it is
optional, and its absence degrades gracefully rather than breaking teardown.
The HTTP client is a thin stdlib urllib POST to <base_url>/chat/completions with temperature=0.2 and
stream=False, so the base hook install needs no extra dependency. Provenance is recorded on the resulting
note: a model-summarized note carries prov_model = "<provider>/<model>", so you can always tell which model
produced a given memory. How distilling episodic notes into durable ones works end to end is in
Reflection.
When we would revisit. There is no fixed model to revisit; the design's entire point is that the choice is
a config value that moves with the frontier. The living evaluation lives in docs/research/model-landscape.md
(local-only), and the CLAUDE.md "Staying current" section makes keeping it fresh part of the workflow. Pin a
model only at the edges (your own env), never in the code.
Because the reflection config is machine-local environment configuration, it is read fresh on each invocation
and is never written into the synced memory/ tree. Set it per machine. The provenance on each note records
which model actually produced it, so you can audit and re-reflect later if you change models.
How these decisions reinforce each other
The five decisions are not independent; they compose into one coherent stance.
- File-first (1) is what makes "never sync the DB" (3) safe: there is exactly one authoritative copy of memory (the files), so the index can be thrown away and rebuilt freely.
- SQLite-with-WAL (2) is the fast, derived read path over those files, and keeping it derived is what lets it be local and disposable rather than something that has to be synced.
- Git-as-sync (4) moves only the markdown, which is only tractable because the source of truth is plain text with clean diffs and a clear conflict story.
- The swappable model (5) keeps the one genuinely fast-moving dependency (the LLM) out of the code and behind config, so the stable core does not churn when the model frontier does.
Each decision also names its own escape hatch: the recall eval for files-vs-graph and for keyword-vs-vectors,
the SyncBackend protocol for sync evolution, and the reflection config plus model-landscape.md for the
model. Nothing here is permanent by assertion; it is permanent until the named evidence shows up.