Chat with your notes.
In the cloud. On your machine.

PKMA is a desktop AI assistant for your Obsidian vault. Semantic search, a streaming agent, wikilink-aware retrieval — running entirely against your own files and your own model. No accounts. No upload. No telemetry.

0 KBdata uploaded
~8 MBRAM at idle
1 binaryno Electron, no Docker
OpenAI-compatible providers
PKMA desktop app — chat panel showing an example conversation about Obsidian frontmatter conventions
The Three Promises

Built for people who would rather not upload their second brain.

Most "chat with your notes" tools require uploading your vault to a third party. PKMA was designed from day one so that question never comes up.

Local

Your vault, the index, the vector store, the model — everything runs on your machine. The server lives on 127.0.0.1:8008. There's nowhere else for your data to go.

SQLite + FTS5 + sqlite-vec on diskNo cloud bucket, no remote DBblake3-hashed, incremental index

Offline

Pair it with Ollama or LM Studio and PKMA runs with the network unplugged. Boot a flight without Wi-Fi and the agent, retrieval, and embeddings keep working.

Works fully air-gappedNo remote auth, no licence pingEmbeddings run locally too

Private

No accounts. No telemetry. No analytics SDK. Source-available so you can read the few hundred lines of network code yourself and confirm there's no phone-home.

Zero telemetry, everNo login wall, no licence keySource-available · PKMA PUL
Bring your own model — any OpenAI-compatible endpoint, for both chat and embeddings.
Ollama LM Studio OpenRouter llama.cpp Custom · OpenAI-compat
Everything in the binary

A research partner that actually reads your notes.

First-class understanding of wikilinks, frontmatter, tags, and the standard .md vault layout. Plus a real agent loop on top.

vault_search

Hybrid retrieval, fused with RRF

FTS5 keyword search and semantic vector search fused together with reciprocal-rank fusion. Catches the note that uses different words AND the note that uses your exact phrase.

graph_neighbors

Wikilink-graph aware

A petgraph-backed link traversal lets the agent walk the [[graph]] up to three hops deep, surfacing connected ideas the embeddings missed.

SSE

Streaming agent loop

Token-by-token responses with live tool-call status, thinking steps, and an interruptible composer.

watcher

Incremental indexing

File watcher + blake3 content-hash dedupe. Only changed notes get re-embedded — and quarantine catches the ones that don't parse.

tauri v2

One desktop binary

A single Tauri binary. No Electron, no Docker, no Python service, no daemon to babysit. Native window chrome, ~8 MB at idle.

multi-vault

Per-vault isolation

Each vault gets its own SQLite database. Switching vaults is cancellation-aware — in-flight indexing aborts cleanly.

obsidian-cli

Live Obsidian tools

Optionally swap the search/read surface for live obsidian-cli tools: backlinks, properties, tasks, outline, daily note.

notes/lookup

Smart citation pills

The agent's replies cite Note Title inline. Each pill is resolved with exact / FTS / LIKE fallbacks, so you see at a glance whether the match is solid or a guess. Hover to preview; click to open in Obsidian.

OpenAI-compat

Provider-agnostic, both ways

Anything speaking the OpenAI chat-completions wire format works for chat. For embeddings, either Ollama's /api/embed or the OpenAI-compatible /v1/embeddings. Swap models per vault from inside the app.

Three speeds

Agent modes for the question you're actually asking.

Each mode has its own tool budget and retrieval cap. Set a default in settings; override per-thread when one query needs more depth.

Fast

One-hop answers.

Single retrieval, no tool follow-ups. For lookups you could almost do with grep.

tool calls≤ 1
retrieval k8
typical latency1-2s
Deep Research

Long-horizon synthesis.

Aggressive tool budgets, deep graph traversal, cross-note reasoning. Pour a coffee.

tool calls≤ 20
retrieval k60
graph depth3
Citations you can actually trust

Every claim links back to a note you wrote.

Wikilinks rendered as pills. Match quality visible at a glance. Hover to preview, click to open in Obsidian. No more "the model said so, somewhere".

Assistant · synthesis

The methodology you settled on in the second draft borrows the sampling frame from Eberhardt 2019 but swaps the bootstrap routine for the one in Resampling — house notes. There's also a note suggesting you reconsider — see Concerns: small-n.

Methodology — final Eberhardt 2019 Resampling — house notes Concerns: small-n
exact fts like / fuzzy unresolved
Eberhardt 2019
type: reference · tags: methods, sampling
A re-read of Eberhardt's framework. Useful for the boundary conditions; the bootstrap implementation here is the part I actually reuse...

The agent quotes you, not the internet.

PKMA only retrieves from the vault you pointed it at. There is no web search tool, no scraping, no training data leak. When the assistant cites a source, it's a note you wrote — and the pill tells you exactly how confidently it matched.

  • exact — title matched verbatim. The pill is solid.
  • fts — full-text fallback. Confident but not perfect.
  • like / fuzzy — fuzzy match. Rendered dashed; treat with care.
  • unresolved — agent invented a title that doesn't exist. Broken-link pill.
Under the hood

One Tauri binary. Two things in the same process.

A Rust core boots an Axum HTTP server on 127.0.0.1:8008; a Next.js webview talks to it over HTTP + SSE. That's the whole architecture.

webview Next.js 16 · React 19 · OKLCH shell
↑↓ HTTP + SSE
rust core · 127.0.0.1:8008 Axum · Tokio · Tower · Rig agent loop
storage VaultDb
SQLite + FTS5 + sqlite-vec
embed HTTP client
OpenAI-compat / Ollama
watcher notify + blake3
incremental hash diff
llm Bring-your-own
chat completions
01

Read-only pool, single writer

VaultDb is 4 read-only connections + 1 writer, all inside spawn_blocking. The chat hot path never blocks a Tokio worker on a SQLite lock.

02

SSE all the way down

Tokens, thinking steps, tool-call start/done, message complete — they're all distinct SSE event types. The composer disables while streaming; retry pulls the prior turn and resends.

03

Hash-diff indexing

blake3 hashes every note body. Unchanged bodies skip embedding entirely. Files the parser can't handle land in a quarantine table so they don't loop in "changed" forever.

04

Cancellation-aware vault switching

Switching to a different vault aborts the in-flight indexer, filters buffered events, and reconnects the SSE stream against the new vault id.

Status · alpha

Shipped. Shipping. Next.

PKMA is under active development. APIs and on-disk schemas may change between commits. The index can be safely rebuilt at any time.

Shipped
  • Hybrid retrieval with RRF fusion
  • Three agent modes with per-thread overrides
  • Live obsidian-cli tool profile
  • Incremental indexing + quarantine
  • Wikilink resolution with match quality
  • Multi-vault with clean cancellation
Next
  • Plugin surface — user-extensible tools
  • Structured edits on Obsidian properties
  • Multi-modal: PDF + image notes
  • Encrypted index export / import
  • Windows + Linux signed installers

Your second brain. Yours to keep.

Free for personal use. No account, no telemetry, no subscription. Pick a build and point it at a folder of Markdown.