Hybrid retrieval, fused with RRF
FTS5 keyword search and semantic vector search fused together with reciprocal-rank fusion. Catches the note that uses different words AND the note that uses your exact phrase.
PKMA is a desktop AI assistant for your Obsidian vault. Semantic search, a streaming agent, wikilink-aware retrieval — running entirely against your own files and your own model. No accounts. No upload. No telemetry.
Your conventions live in Frontmatter — house rules and are referenced from Atomic note template. The short version:
Every note gets a title, a kebab-case
id, an ISO created date, and a
flat tags: list. type is one of
atomic, map, journal,
or reference.
The examples in Notes on Eberhardt 2019 drift — they nest tags under a parent key. Want me to propose a fix and show the diff?
Most "chat with your notes" tools require uploading your vault to a third party. PKMA was designed from day one so that question never comes up.
Your vault, the index, the vector store, the model — everything runs on your machine. The server lives on 127.0.0.1:8008. There's nowhere else for your data to go.
Pair it with Ollama or LM Studio and PKMA runs with the network unplugged. Boot a flight without Wi-Fi and the agent, retrieval, and embeddings keep working.
No accounts. No telemetry. No analytics SDK. Source-available so you can read the few hundred lines of network code yourself and confirm there's no phone-home.
First-class understanding of wikilinks, frontmatter, tags, and the
standard .md vault layout.
Plus a real agent loop on top.
FTS5 keyword search and semantic vector search fused together with reciprocal-rank fusion. Catches the note that uses different words AND the note that uses your exact phrase.
A petgraph-backed link traversal lets the agent walk the [[graph]] up to three hops deep, surfacing connected ideas the embeddings missed.
Token-by-token responses with live tool-call status, thinking steps, and an interruptible composer.
File watcher + blake3 content-hash dedupe. Only changed notes get re-embedded — and quarantine catches the ones that don't parse.
A single Tauri binary. No Electron, no Docker, no Python service, no daemon to babysit. Native window chrome, ~8 MB at idle.
Each vault gets its own SQLite database. Switching vaults is cancellation-aware — in-flight indexing aborts cleanly.
Optionally swap the search/read surface for live obsidian-cli tools: backlinks, properties, tasks, outline, daily note.
The agent's replies cite Note Title inline. Each pill is resolved with exact / FTS / LIKE fallbacks, so you see at a glance whether the match is solid or a guess. Hover to preview; click to open in Obsidian.
Anything speaking the OpenAI chat-completions wire format works for chat. For embeddings, either Ollama's /api/embed or the OpenAI-compatible /v1/embeddings. Swap models per vault from inside the app.
Each mode has its own tool budget and retrieval cap. Set a default in settings; override per-thread when one query needs more depth.
Single retrieval, no tool follow-ups. For lookups you could almost do with grep.
Multi-step retrieval with graph walks and quote-checks. The mode you actually use.
Aggressive tool budgets, deep graph traversal, cross-note reasoning. Pour a coffee.
Wikilinks rendered as pills. Match quality visible at a glance. Hover to preview, click to open in Obsidian. No more "the model said so, somewhere".
Assistant · synthesis
The methodology you settled on in the second draft borrows the sampling frame from Eberhardt 2019 but swaps the bootstrap routine for the one in Resampling — house notes. There's also a note suggesting you reconsider — see Concerns: small-n.
PKMA only retrieves from the vault you pointed it at. There is no web search tool, no scraping, no training data leak. When the assistant cites a source, it's a note you wrote — and the pill tells you exactly how confidently it matched.
A Rust core boots an Axum HTTP server on
127.0.0.1:8008; a Next.js
webview talks to it over HTTP + SSE. That's the whole architecture.
VaultDb is 4 read-only connections + 1 writer, all inside spawn_blocking. The chat hot path never blocks a Tokio worker on a SQLite lock.
Tokens, thinking steps, tool-call start/done, message complete — they're all distinct SSE event types. The composer disables while streaming; retry pulls the prior turn and resends.
blake3 hashes every note body. Unchanged bodies skip embedding entirely. Files the parser can't handle land in a quarantine table so they don't loop in "changed" forever.
Switching to a different vault aborts the in-flight indexer, filters buffered events, and reconnects the SSE stream against the new vault id.
PKMA is under active development. APIs and on-disk schemas may change between commits. The index can be safely rebuilt at any time.
obsidian-cli tool profileFree for personal use. No account, no telemetry, no subscription. Pick a build and point it at a folder of Markdown.