Notes, the markdown format, and the store
The Memory record, the markdown-with-YAML-frontmatter format, the three note types, scope, the on-disk store layout, and the rebuildable SQLite FTS5 index.
This is the canonical reference for how a single memory note is represented in Anamnesis: the in-memory Memory record, the markdown file it serializes to, the directory it lives in, and the SQLite index derived from it. Everything here is grounded in server/src/anamnesis/store.py, server/src/anamnesis/inject.py, and server/src/anamnesis/config.py.
The governing rule, repeated throughout: markdown files are the source of truth, and the SQLite index is fully derived. The index can always be deleted and rebuilt from the markdown with no loss.
The Memory record
A note is modeled in code as the Memory dataclass (store.py). Every field, with its default:
| Field | Type | Default | Notes |
|---|---|---|---|
id | str | (required) | A ULID string, generated on write via str(ULID()). |
type | str | (required) | One of procedural, semantic, episodic. |
title | str | (required) | Short human-readable label. |
body | str | (required) | The markdown body (everything after the frontmatter). |
project | str | "global" | Project key (see project resolution). |
machine_id | str | "unknown" | The machine that authored the note. |
scope | str | "portable" | portable or machine-local (see scope). |
tags | list[str] | [] | Free-form tags. |
created_at | str | "" | UTC ISO-8601, seconds precision. |
updated_at | str | "" | UTC ISO-8601, seconds precision. |
prov_source | str | "human" | One of human, session-end, reflection, import. |
prov_model | str | "" | Model id, when a model produced the note. |
prov_session | str | "" | Originating session id, when known. |
confidence | float | 1.0 | Used to break recency ties during injection. |
supersedes | str | "" | Id of a note this one replaces. |
Two type aliases document the constrained string fields: MemoryType = str ("procedural" | "semantic" | "episodic") and Scope = str ("portable" | "machine-local").
Timestamps come from _utcnow(), which returns datetime.now(UTC).isoformat(timespec="seconds"), so the format is for example 2026-06-24T18:33:07+00:00.
The markdown format
Each note is one markdown file: a YAML frontmatter block delimited by ---\n (the _FM_DELIM constant), followed by the body. _serialize(mem) writes the frontmatter with yaml.safe_dump(meta, sort_keys=False, allow_unicode=True), so the key order is fixed by insertion order, not alphabetical.
Frontmatter fields, in write order
_serialize builds the metadata dict in exactly this order:
idtypetitleprojectmachine_idscopeprov_sourceconfidenceprov_model(only if non-empty)prov_session(only if non-empty)supersedes(only if non-empty)created_atupdated_attags
prov_model, prov_session, and supersedes are omitted entirely when empty, so a hand-written or human-sourced note typically has none of them. created_at, updated_at, and tags are always written, and always come after the optional provenance keys.
A representative file (~/.anamnesis/memory/semantic/<ULID>.md) looks like:
---
id: 01J9Z8YPM7Q3X2V4WT6B5N0KGD
type: semantic
title: Dashboard grid minmax convention
project: github.com/oscardvs/anamnesis
machine_id: thinkpad
scope: portable
prov_source: human
confidence: 1.0
created_at: '2026-06-24T18:33:07+00:00'
updated_at: '2026-06-24T18:33:07+00:00'
tags:
- dashboard
- css
---
Wrap every Tailwind grid-cols track in minmax(0, ...) so wide content does not blow out the layout.A reflection-derived note that replaces an earlier one adds the optional keys between confidence and created_at:
---
id: 01J9ZB0C4F8H2K6M3P9R7S5T1W
type: procedural
title: Run reflect safely
project: github.com/oscardvs/anamnesis
machine_id: thinkpad
scope: portable
prov_source: reflection
confidence: 0.8
prov_model: claude-opus-4-8
prov_session: 3bf75f14-4c3f
supersedes: 01J9Z8YPM7Q3X2V4WT6B5N0KGD
created_at: '2026-06-24T19:01:55+00:00'
updated_at: '2026-06-24T19:01:55+00:00'
tags:
- reflection
---
Commit immediately after reflect, or a concurrent sync can wipe the output.Round-tripping (serialize and deserialize)
_serialize appends one trailing newline to the body (f"{_FM_DELIM}{front}{_FM_DELIM}{mem.body}\n"). _deserialize reverses this exactly:
- It requires the text to start with
---\n, otherwise it raisesValueError("memory file missing YAML front-matter"). - It splits on the closing
\n---\ndelimiter (text[len(_FM_DELIM):].partition("\n" + _FM_DELIM)). - It strips the single trailing newline that
_serializeadded (if body.endswith("\n"): body = body[:-1]). - Missing optional keys fall back to the same defaults as the dataclass:
project="global",machine_id="unknown",scope="portable",tags=[],prov_source="human",confidence=1.0, and empty strings forprov_model,prov_session,supersedes.confidenceis coerced withfloat(...).
Because deserialization tolerates missing optional keys and supplies defaults, you can hand-author a minimal note with just id, type, and title in the frontmatter and it will index correctly. The full set of keys is what the writer produces, not what the reader requires.
Note types
There are exactly three note types, enforced at the SQLite layer by a CHECK (type IN ('procedural','semantic','episodic')) constraint on the memories table:
procedural- how to do something (steps, commands, conventions). Durable.semantic- facts and stable knowledge about the world or the project. Durable.episodic- what happened in a session ("what I last did"). Treated as transient continuity.
The distinction matters at injection time. In inject.py, _DURABLE = ("procedural", "semantic") are the note types that fill the main injection budget, while episodic notes are capped by _MAX_EPISODIC = 2 and serve only as a short "what I last did" continuity thread. Once an episodic note has been folded into durable notes by reflection, it is tagged reflected and dropped from injection ("reflected" not in m.tags), since its content now lives in the durable notes.
Scope: portable vs machine-local
scope answers one question: does this note travel to your other machines? There are two values:
portable(the default) - the note belongs to the synced corpus.machine-local- the note stays on the machine that created it.
The tree is authoritative for scope
Scope is not trusted from the frontmatter when rebuilding the index. It is determined by which directory tree the file is in. MemoryStore._dir_for_scope(scope) returns self.local_dir for machine-local and self.memory_dir for everything else:
def _dir_for_scope(self, scope: Scope) -> Path:
return self.local_dir if scope == "machine-local" else self.memory_dirOn reindex, the store walks both trees and overwrites the in-memory scope from the tree it found the file in, regardless of what the frontmatter said:
for base, scope in ((self.memory_dir, "portable"), (self.local_dir, "machine-local")):
for path in sorted(base.rglob("*.md")):
mem = _deserialize(path.read_text(encoding="utf-8"))
mem.scope = scope # tree wins
self._index(mem, str(path.relative_to(base)))Moving a file between memory/ and local/ changes its scope on the next reindex, even if the frontmatter still says otherwise. The directory is the authority. The frontmatter scope value is a convenience for readers and for the freshly-written file; the reindex path reconciles it to the tree.
get mirrors this: it reads the stored scope from the index, picks the base directory with _dir_for_scope, and reads the body from base / body_path. The body_path stored in the index is relative to the scope's base directory, not to the store root.
Store layout under ~/.anamnesis
The store root defaults to ~/.anamnesis. config.resolve_home() resolves it from ANAMNESIS_HOME if set, otherwise Path.home() / ".anamnesis".
~/.anamnesis/
memory/ # SOURCE OF TRUTH, portable, git-synced
procedural/<ULID>.md
semantic/<ULID>.md
episodic/<ULID>.md
local/ # SOURCE OF TRUTH, machine-local, NEVER synced
procedural/<ULID>.md
semantic/<ULID>.md
episodic/<ULID>.md
index.db # DERIVED, SQLite (WAL + FTS5), rebuildable
config.json # machine-local config, never syncedMemoryStore.__init__ wires these paths and creates both note trees if missing:
self.memory_dir = self.root / "memory"
self.local_dir = self.root / "local"
self.db_path = self.root / "index.db"
self.memory_dir.mkdir(parents=True, exist_ok=True)
self.local_dir.mkdir(parents=True, exist_ok=True)Within each tree, the relative path of a note is <type>/<id>.md (set on write as rel_path = f"{mem.type}/{mem.id}.md"). So a portable procedural note lives at ~/.anamnesis/memory/procedural/<ULID>.md and a machine-local one at ~/.anamnesis/local/procedural/<ULID>.md.
config.json
config.json is machine-local and never synced. It lives at <home>/config.json, deliberately outside the synced memory/ tree, because the git remote URL differs per machine. It is written by anamnesis init (onboard.write_store_config) and holds two keys:
{
"machine_id": "thinkpad",
"remote": "git@example.com:you/anamnesis-memory.git"
}remote is omitted when you run local-only. config.py reads these via _store_config() (a best-effort JSON read that returns {} on any OSError/ValueError so resolution never crashes on a bad file), and exposes:
resolve_machine_id()-ANAMNESIS_MACHINE_ID, elseconfig.json'smachine_id, elsesocket.gethostname(), else"unknown".resolve_remote()-ANAMNESIS_GIT_REMOTE, elseconfig.json'sremote, elseNone. Theconfig.jsonfallback is what lets the MCP server (launched from.mcp.jsonwithout inline env) and the dashboard find the remote so an in-sessionmemory_synccan push rather than only commit locally.
Never sync the raw index.db over a cloud folder. Sync the markdown under memory/ via git and rebuild the index locally on each machine. config.json and the entire local/ tree are intentionally outside the synced corpus.
The SQLite index
index.db is a derived cache. It is opened with sqlite3.connect(self.db_path, check_same_thread=False) because the FastMCP server runs sync tools in a worker threadpool and shares the connection across threads. Two PRAGMAs make that safe:
self._db.execute("PRAGMA journal_mode=WAL")
self._db.execute("PRAGMA busy_timeout=5000")- WAL mode lets concurrent Claude Code sessions read while one writes, avoiding the file-locking conflicts a rollback journal would cause.
busy_timeout=5000(5 seconds) makes a blocked writer wait and retry rather than fail immediately under contention.
Schema
The schema (_SCHEMA in store.py) is three objects: a structured memories table, a memory_tags join table, and a memories_fts FTS5 virtual table.
CREATE TABLE IF NOT EXISTS memories (
id TEXT PRIMARY KEY,
type TEXT NOT NULL CHECK (type IN ('procedural','semantic','episodic')),
title TEXT NOT NULL,
body_path TEXT NOT NULL,
project TEXT NOT NULL DEFAULT 'global',
machine_id TEXT NOT NULL,
scope TEXT NOT NULL DEFAULT 'portable' CHECK (scope IN ('portable','machine-local')),
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
prov_source TEXT NOT NULL DEFAULT 'human'
CHECK (prov_source IN ('human','session-end','reflection','import')),
prov_model TEXT,
prov_session TEXT,
confidence REAL NOT NULL DEFAULT 1.0,
supersedes TEXT
);
CREATE INDEX IF NOT EXISTS idx_mem_scope ON memories(project, type, scope);
CREATE INDEX IF NOT EXISTS idx_mem_recency ON memories(updated_at DESC);
CREATE INDEX IF NOT EXISTS idx_mem_prov ON memories(prov_source);
CREATE TABLE IF NOT EXISTS memory_tags (
memory_id TEXT NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
PRIMARY KEY (memory_id, tag)
);
CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts USING fts5(
id UNINDEXED, title, body, tags, tokenize='porter unicode61'
);Note what the memories table does and does not store. It holds all the structured metadata plus body_path (the relative path to the markdown file), but not the body itself. The body lives only in the markdown file and, for search, in the FTS5 table.
The memories_fts virtual table indexes title, body, and tags (with id carried UNINDEXED so it can be selected back). The tokenizer is porter unicode61: unicode61 provides Unicode-aware tokenization and diacritic folding, and porter adds English stemming so "running" matches "run". Tags are stored in FTS as a single space-joined string (" ".join(mem.tags)).
How a row is written
_index(mem, rel_path) does an idempotent upsert for one note:
INSERT OR REPLACE INTO memories (...)with all structured columns. Emptyprov_model,prov_session, andsupersedesare stored as SQLNULL(mem.prov_model or None, etc.).DELETE FROM memory_tags WHERE memory_id = ?then re-insert the current tags.DELETE FROM memories_fts WHERE id = ?then re-insert the FTS row.
This delete-then-insert pattern keeps memory_tags and memories_fts consistent on rewrites. write and put both call _index and then self._db.commit(). On any indexing failure they unlink the just-written markdown file before re-raising, so a half-written note never lingers on disk:
abs_path.write_text(_serialize(mem), encoding="utf-8")
try:
self._index(mem, rel_path)
except Exception:
abs_path.unlink(missing_ok=True)
raise
self._db.commit()write generates the id and timestamps for you; put takes a fully-formed Memory (caller-supplied id and timestamps) and upserts by id, which the native-memory importer uses to make re-imports overwrite in place rather than duplicate.
Supersession
A note with a non-empty supersedes pointing at another note's id hides that older note from recall. superseded_ids() collects every non-empty supersedes value, and both search and the injection selector exclude those ids:
AND m.id NOT IN
(SELECT supersedes FROM memories WHERE supersedes IS NOT NULL AND supersedes <> '')Superseded notes are hidden from recall and injection but remain on disk and browsable via list.
Schema version and rebuild from markdown
_SCHEMA_VERSION = 1 is stored in SQLite's PRAGMA user_version. On open, the store compares the DB's recorded version against the constant:
Because the index is fully derived, a version bump needs no hand-written migration: the store drops memories, memory_tags, and memories_fts, recreates them from _SCHEMA, sets user_version, and calls reindex().
reindex() is the canonical "rebuild from source of truth" path and can be run any time (it returns the number of notes indexed):
DELETE FROM memories,DELETE FROM memory_tags,DELETE FROM memories_fts.- Walk
memory/asportableandlocal/asmachine-local, in that order, oversorted(base.rglob("*.md")). - Deserialize each file, force
scopefrom the tree, and_indexit under its path relative to that tree's base. - Commit.
If index.db is ever lost or corrupted, deleting it and re-opening the store (or running a reindex) reconstructs it entirely from the markdown. No memory is lost, because the markdown is the source of truth.
Project resolution
Notes are scoped to a project key. The default is "global", and global notes are always injected in full. For a working directory, the project key is resolved by inject.resolve_project_key(cwd) in a fixed precedence order:
.anamnesis/projectmarker._read_marker(cwd)searches fromcwdupward through its parents and returns the first non-empty line of the nearest.anamnesis/projectfile. The search stops below the home directory and the filesystem root, so a stray marker at$HOMEcannot hijack every project. This is the explicit, cross-machine-stable override, useful for non-git workspaces where a subdirectory would otherwise resolve to a bare basename.- Normalized git
originremote. Runsgit -C <cwd> remote get-url origin; on success the URL is normalized by_normalize_remote: strip the scheme (https://,ssh://,git://), strip a leadinguser@, convert the scp formhost:pathtohost/path, strip a trailing.git, then strip trailing slashes and lowercase. Sogit@github.com:oscardvs/anamnesis.gitandhttps://github.com/oscardvs/anamnesisboth normalize togithub.com/oscardvs/anamnesis. - Repo-root directory name. If there is no
origin,git -C <cwd> rev-parse --show-toplevelgives the repo root, and its directory name is used, lowercased. - cwd basename. Outside any git repo, the basename of
cwdlowercased, or"global"if that is empty.
Normalizing the remote is what makes a project key stable across machines: the same repo cloned over SSH on one machine and HTTPS on another resolves to the same key, so its notes group together. The inject.py docstring flags the fuller cross-machine identity work as a deliberate follow-up isolated to this one function.
Related pages
- Recall and search - how the FTS5 BM25 query is built and ranked.
- Capture and injection - how notes are selected and rendered at SessionStart.
- Sync - how the
memory/tree travels over git on a Tailscale mesh. - Architecture overview - the file-first design in one place.
Architecture
The five layers (markdown source of truth, derived SQLite FTS5 index, git-over-Tailscale sync, FastMCP server plus lifecycle hooks, Next.js dashboard) and how Claude Code drives them.
Keyword recall: FTS5 and BM25
How search() works: an FTS5 MATCH against memories_fts, BM25 ranking with a recency tie-break, OR-joined query tokens, and excluded superseded notes.