In-memory projection: implementation plan

Status: Planned 2026-04-25; updated 2026-04-26 to use narrow ports per target-architecture.md. Supersedes the implementation-order section of event-sourced-state.md, which described the what and why. This doc is the how — concrete shape, files, sequencing, and the unified-log decision that came out of the planning round.

What changed since `event-sourced-state.md`

Unified event log decision: yes. Collapse the four parallel JSONL streams (entries, interactions, firings, classifications) into one events.jsonl with a t discriminator. Reducers project four views from the one stream. Rationale below.
Concrete reducer + lock layout. One RwLock<ProjectionData>, one Projection::apply(event) switch, fanned out to view-specific upserts.
Narrow ports, not &dyn AppendLog. Per target-architecture.md, services depend on &dyn EventStore (write side) and &Projection (read side), not on the umbrella Device trait. The pure sublayer (ProjectionData::apply, query fns) takes no traits at all.
Migration is a one-time replay tool, not an in-app fork.
Stage gating. Five stages, each independently shippable, each reversible until stage 4.

For the architectural framing this plan implements — what’s “pure” vs. “hexagonal” in this codebase, why we didn’t go strict FC/IS, and the no-&dyn Device-in-core rule — see target-architecture.md. This doc assumes that target and shows how the projection refactor lands us there.

Decisions

Question	Decision	Why
Unified log?	Yes — one `events.jsonl` per user with `t` discriminator	One merge surface for sync; one apply switch; cross-stream invariants enforceable
Lock granularity	Single `RwLock<ProjectionData>`	5:1 read:write at this scale; no torn reads on cross-view joins
Hydration timing	Sync, at Tauri startup, before `app.manage`	Sub-10k records replays in <100ms; solo-prototype, blocking is fine
Write path owner	`Projection::append_*` does both phases	Two-phase invariant lives in one place; reducer = hydrator
Reducer state shape	Materialized per-view (`HashMap<id, Entry>`, `Vec<Interaction>`, `Vec<Firing>`) + materialized SRS-per-entry inputs	Reads are O(view-size); SRS offset applied at read because `utc_offset_seconds` is read-time
Edit semantics	Full snapshots, never diffs	One event type per concept; reducer doesn’t need prior state to apply
Recovery on phase-2 panic	Log is truth → next boot re-hydrates	No app-level rollback; solo-prototype simplicity
Out-of-order on sync	Tolerate; latest-wins by `(id, timestamp)`	No vector clocks; clock skew tolerated within seconds

The unified event log

Shape

One file per user: events.jsonl. Each line is one event, internally tagged on t, PascalCase EntityVerbed past-tense:

// Entries (entity events; latest-wins by id)
{ "t": "EntryCaptured",      "id": "e-...", "capturedAt": 0, "source": "...", "content": "...", "shape": { "t": "Unsure" } }
{ "t": "EntryEdited",        "id": "e-...", "editedAt":   0, "source": "...", "content": "...", "shape": { ... } }
{ "t": "EntryDeleted",       "id": "e-...", "deletedAt":  0 }

// Classifier provenance (append-only; latest per entryId is current pick)
{ "t": "EntryClassified",    "id": "cls-...", "entryId": "e-...", "classifierShape": "train", "confidence": "high", "reasoning": "...", "classifiedAt": 0 }

// Practice rituals (append-only; idempotent by id)
{ "t": "Practiced",          "id": "ix-...", "entryId": "e-...", "occurredAt": 0, "response": null }
{ "t": "Integrated",         "id": "ix-...", "entryId": "e-...", "occurredAt": 0, "response": null }

// Real-world firings (append-only; idempotent by id)
{ "t": "FiringLogged",       "id": "fir-...", "entryId": "e-...", "occurredAt": 0, "note": null }

Why unified beats per-stream here

One merge surface for sync. Dropbox conflict files fan out per file; one log = one conflict.
Cross-stream invariants stay enforceable. EntryView joins entries × interactions × SRS today; with a totally-ordered stream the join becomes a deterministic left-fold rather than a multi-source dance.
One apply switch, N projections. The reducer is one match; fan-out to view-specific upserts is structural, not behavioral.
Schema evolution lives in one place. Adding EntryArchived is one new arm.

Why we still keep `EntryEdited` distinct from `EntryCaptured`

Structurally identical (full snapshot), semantically distinct. Analytics and the audit trail read better. Cost is one extra arm. Defensible to collapse into EntryWritten later — keep both for now.

Schema evolution rules

Never rename a field. Use #[serde(rename = "diskName")] if the Rust name needs to drift.
Never change a field’s meaning. Add a new field with #[serde(default)], or mint a new event variant (EntryCapturedV2).
Unknown t values: log + skip. Mirrors today’s silent-skip on malformed lines (if let Ok(...) in list_entries). Forward-compat for free at solo-prototype scale.
No _ wildcard on the apply match. A new event variant is a compile error at every reducer site (project rule).

What goes in the log: facts, never commands

EntryClassified (with the result baked in), never ClassifyEntry (which would re-fire the LLM during replay). The side-effect orchestration rule (side-effect-orchestration.md) stays intact: Rust runs the side effect, then appends the resulting fact. Replay never re-runs the side effect. No now() or random() inside a reducer.

The runtime layer

// crates/core/src/projection.rs (new module)

pub struct Projection {
    inner: RwLock<ProjectionData>,
}

struct ProjectionData {
    // Latest-wins by id; tombstones already removed.
    entries_by_id: HashMap<String, Entry>,
    // File-order, no dedup.
    interactions: Vec<Interaction>,
    firings: Vec<Firing>,
    classifications: Vec<Classification>,
    // SRS inputs per entry: last-action timestamp + practiced count
    // + last action. Offset is read-time (next_review_millis depends
    // on user's *current* local offset, not write-time offset).
    srs_inputs_by_entry: HashMap<String, SrsInputs>,
}

struct SrsInputs {
    latest_action: Action,
    latest_timestamp: i64,
    practiced_count: u32,
}

Wrapped in Arc<Projection> and stored in tauri::State. The classify spawn task and the dispatch handler each clone the Arc; inner RwLock serializes writes against reads.

Reducer sketch

impl ProjectionData {
    fn apply(&mut self, event: Event) {
        match event {
            Event::EntryCaptured { id, captured_at, source, content, shape }
          | Event::EntryEdited   { id, edited_at: captured_at, source, content, shape } => {
                self.entries_by_id.insert(id.clone(), Entry {
                    id, created_at: captured_at, source, content, shape,
                    is_deleted: false,
                });
            }
            Event::EntryDeleted { id, .. } => {
                self.entries_by_id.remove(&id);
                // SRS inputs deliberately retained — if a tombstone
                // is later reversed by a fresh EntryCaptured/Edited
                // (resurrection), the practice history rejoins.
            }
            Event::EntryClassified { id, entry_id, classifier_shape, confidence, reasoning, classified_at } => {
                self.classifications.push(Classification {
                    id, entry_id, classifier_shape, confidence, reasoning, classified_at,
                });
            }
            Event::Practiced { id, entry_id, occurred_at, response } => {
                self.interactions.push(Interaction {
                    id, entry_id: entry_id.clone(), timestamp: occurred_at,
                    action: Action::Practiced, response,
                });
                self.bump_srs(entry_id, Action::Practiced, occurred_at);
            }
            Event::Integrated { id, entry_id, occurred_at, response } => {
                self.interactions.push(Interaction {
                    id, entry_id: entry_id.clone(), timestamp: occurred_at,
                    action: Action::Integrated, response,
                });
                self.bump_srs(entry_id, Action::Integrated, occurred_at);
            }
            Event::FiringLogged { id, entry_id, occurred_at, note } => {
                self.firings.push(Firing { id, entry_id, timestamp: occurred_at, note });
            }
        }
    }
}

Notes:

No _ wildcard — new event = compile error here.
bump_srs is internal to the reducer; recomputes the per-entry SRS inputs from current state.
Reads compute the offset-dependent next_review_millis at query time, not in the reducer. Otherwise DST/timezone changes silently invalidate the cache.

Read paths after the refactor

Command::EntriesList => {
    let utc_offset_seconds = chrono::Local::now().offset().local_minus_utc();
    let views = ctx.projection.list_entry_views(utc_offset_seconds);
    (Response::ok_json("Entries.List", serde_json::to_value(&views).unwrap()),
     DispatchSideEffect::None)
}

list_entry_views holds the read lock once, walks entries_by_id, joins each entry against srs_inputs_by_entry to compute next_review_millis, returns the Vec<EntryView>. Microseconds at <10k records.

dispatch_with_device is renamed dispatch and takes a DispatchCtx<'a> carrying the narrow ports it needs:

pub struct DispatchCtx<'a> {
    pub projection:    &'a Projection,
    pub store:         &'a dyn EventStore,
    pub classifier:    &'a dyn Classifier,
    pub session:       &'a dyn SessionWriter,
    pub clock:         &'a dyn Clock,
    pub notifications: &'a dyn Notifications,
}

Device retreats to a shell-side aggregate that bundles concrete adapters; it is no longer imported by crates/core or crates/api.

Write paths after the refactor

The write API takes a narrow EventStore port (not &dyn AppendLog, not &dyn Device) per the application-core target. EventStore exposes exactly two methods: append(&Event) and read_all() -> Vec<Event>. The shell-side FsEventStore / CloudEventStore implement it on top of today’s filesystem and R2 backings.

// crates/core/src/services/capture.rs (use-case service)
pub fn record_entry_captured(
    projection: &Projection,
    store:      &dyn EventStore,
    event:      EntryCaptured,
) -> Result<(), EventStoreError> {
    let event = Event::EntryCaptured(event);
    // Phase 1: durability anchor. Append the fact to the log.
    store.append(&event)?;
    // Phase 2: in-memory apply. Same reducer the hydrator uses.
    let mut data = projection.write_data();
    data.apply(event);
    Ok(())
}

Projection::write_data is a thin shell over RwLock::write. ProjectionData::apply is pure — no traits, no I/O — and is the same code path used at hydration. If phase 2 panics, the log is correct and next boot re-hydrates.

Hydration

impl Projection {
    pub fn hydrate(store: &dyn EventStore) -> Result<Self, EventStoreError> {
        let mut data = ProjectionData::default();
        for event in store.read_all()? {
            data.apply(event);
        }
        Ok(Projection { inner: RwLock::new(data) })
    }
}

EventStore::read_all returns Vec<Event> (already deserialized); malformed-line handling moves into the store impl, where it can log + skip with the right context. The pure apply never has to think about JSON.

In tauri::Builder::setup, before app.manage:

let store: Arc<dyn EventStore> = pick_event_store(&session);
let projection = Projection::hydrate(&*store).expect("hydrate");
app.manage(AppState { store, projection: Arc::new(projection), /* + other ports */ });

Sub-10k records replays in <100ms locally. With cloud (R2) backing, hydration is one network round-trip per log file — measure once, accept the cold-boot latency, document it. A future “rehydrate on sign-in” path can warm the cache without blocking startup.

Stages (each shippable independently)

Stage 1 — Scaffold the projection + EventStore port (no behavior change)

New module: crates/core/src/projection/{mod,data,apply,query}.rs with Projection, ProjectionData (pure), apply (pure), query fns (pure), and hydrate.
New module: crates/core/src/ports.rs defining EventStore (and stubs for Classifier, SessionReader, Clock, etc., even if unused at this stage).
Shell-side adapter: FsEventStore (and later CloudEventStore) in src-tauri/src/. For Stage 1 it adapts on top of the existing per-stream JSONL files — read_all() reads each file, decodes, and yields synthetic Event::EntryCaptured etc. so the reducer has something to fold.
Wire into AppState in src-tauri/src/lib.rs. Don’t read from the projection yet.
Test: Projection::hydrate(&store) matches list_entries / list_interactions / list_firings / list_classifications for a scenario seeded with all four streams.

Reversible: deleting the new modules restores prior behavior.

Stage 2 — Switch reads to projection (one command at a time)

EntriesList first. Switch the dispatch arm to call projection.list_entry_views(offset). Keep the legacy list_entry_views(device, offset) as a fallback path for one release, gated behind a debug flag if the projection diverges.
Then InteractionsList, FiringsList. Trivial — just a clone() of the Vec.
Test: new dispatch tests against the projection-backed handler; existing wire-format tests stay green.

Reversible: revert each dispatch arm.

Stage 3 — Two-phase writes via use-case services

Add use-case services in crates/core/src/services/ (e.g. services::capture::record_entry_captured(projection, store, event)). Each service does phase 1 (store.append) then phase 2 (projection.apply) atomically.
Switch every EntriesAppend, EntriesSoftDelete, InteractionsAppend, FiringsAppend dispatch arm to call the matching service.
Switch the classify_and_supersede task to a services::classify::classify_entry service that takes &dyn Classifier, &dyn EventStore, &dyn SessionReader, &dyn Clock, &Projection — not &dyn Device. Per target-architecture.md. Also fix the stale-classification race here (see “Known correctness debt” in the target-architecture doc — pin the classification to the captured event id/version; reducer refuses if a later edit landed).
Drop the post-capture 2.5s EntriesList refetch in Elm (src/Main.elm — the timer-driven refetch added 2026-04-25 to make the classifier supersede visible). Reads are now cheap and fresh.
Drop the on-Detail-nav EntriesList refetch for the same reason.

Partially reversible: you can revert the Elm-side cleanup independently from the Rust-side wiring.

Stage 4 — Unified event log (shipped as a single move, no dual-write)

The plan originally called for a dual-write window for safety. Solo prototype, single user (Justin) — we skipped the dual-write and went straight to unified-only. What actually shipped:

Event enum + From<Entry/Interaction/Firing/Classification> conversions: in core since Stage 1.
Write side: FsEventStore::append writes only to events.jsonl. The per-stream JSONL files are no longer touched.
Read side: read_all prefers events.jsonl; falls back to synthesizing events from the per-stream files only when the unified file is empty (the pre-backfill window).
One-shot backfill: FsEventStore::backfill_unified_if_needed runs at startup (Tauri host + dev-web core-server) and after every /dev/seed. Reads legacy per-stream files, writes synthesized events to events.jsonl. Idempotent; no-op once the unified file has data.
Per-stream files become read-only fossils after backfill — never written to, ignored on read. We left them on disk rather than deleting; if anything ever goes wrong the data is recoverable.

If multi-user shows up later and we need a true safety window before a destructive migration, revisit this — re-introduce dual-write + delayed cleanup. For one user it’s not worth the code.

Stage 5 — Schema-version sync hooks

Out of scope for the initial refactor, but the unified log makes it easy:

Re-hydrate on file-mtime change (Dropbox watcher).
Per-event R2 objects for cheap remote sync (the planned follow-up).
tauri::ipc::Channel<Event> push from Rust to Elm so the UI can reactively update without polling.

Failure modes

Mode	What happens	Recovery
Phase 2 panic (poisoned `RwLock`)	Dispatch returns error envelope to Elm	Next boot re-hydrates from disk
Malformed event line	`apply` skips with `eprintln!`	Log + carry on; investigate via dev-web `/dev/state`
`events.jsonl` partially written (power loss mid-line)	Last line fails to deserialize	Skipped; data after the partial write is gone but earlier lines are intact
Migration tool produces wrong projection	Stage 4 dual-write means legacy files still authoritative	Re-run tool, fix bugs, repeat
R2 unreachable at boot	Hydration fails	Fall back to local cache (or empty projection); user sees an error banner; retry on next dispatch

Files that change

File	Scope
`crates/core/src/projection/{mod,data,apply,query}.rs`	New — `Projection` (RwLock holder), `ProjectionData` (pure), `apply` (pure), query fns (pure)
`crates/core/src/events.rs`	New (Stage 4) — `Event` enum
`crates/core/src/ports.rs`	New — narrow port traits: `EventStore`, `Classifier`, `SessionReader`, `SessionWriter`, `Clock`, `Notifications`
`crates/core/src/services/{capture,classify,notifications,auth}.rs`	New — use-case services taking narrow ports
`crates/core/src/lib.rs`	Re-export `Projection`, `Event`, ports + services. Stop re-exporting `Device`.
`crates/core/src/domain.rs`	Keep types; drop `list_` / `append_` once services replace them
`crates/core/src/append_log.rs`	Delete — `AppendLog` trait is shell-internal now (replaced by `EventStore`)
`crates/api/src/lib.rs`	`dispatch` takes `DispatchCtx<'a>`; every arm calls a service; `DispatchSideEffect` stays as the host-spawn intent
`crates/api/src/bin/core_server.rs`	Builds a `DispatchCtx` from concrete shell adapters
`src-tauri/src/lib.rs`	`AppState` carries `Arc<Projection>` + concrete port impls; `core_dispatch` builds a `DispatchCtx` per call; `Device` (the aggregate) lives here
`src/Main.elm`	Drop the 2.5s post-capture refetch + on-Detail-nav `EntriesList` calls (Stage 3)

Open questions / non-decisions

Lamport clocks for sync. Deferred. Single-device for now; clock skew tolerated to seconds. Revisit when the second device lands.
Snapshot-and-tail for log size. Deferred. Today’s lifetime record count fits in memory. Revisit when boot replay exceeds 1s.
Push-to-Elm via tauri::ipc::Channel. Deferred. Elm refetches on user actions; the projection is fresh on every read so the current pattern works without a push channel.
Encrypting the unified log on R2. Tracked separately (project_e2e_encryption memory).

Why not a database

SQLite would give us indexes + transactions. We don’t need them at this scale (HashMap is faster than B-tree for sub-10k entries; phase 1

2 ordering gives us write atomicity-enough). Adding SQLite is reversible later if we outgrow this; today it’s overhead.

Prior art

The agent round 2026-04-25 surfaced these references — read these before re-litigating any of the above:

Greg Young — Versioning in an Event Sourced System — weak-schema rule, never rename, mint new variants. https://leanpub.com/esversioning/read
Oskar Dudycz — Simple events versioning patterns — concrete upcasting/default-values recipes. https://event-driven.io/en/simple_events_versioning_patterns/
Linear’s sync engine reverse-engineered (wzhudev) — SyncActions = changesets applied to a materialized object graph; LWW for ordering, no CRDTs. https://github.com/wzhudev/reverse-linear-sync-engine
Replicache “How it works” — Causal+ ordering, mutator replay against newer base versions; the reference for “what to do when events arrive late.” https://doc.replicache.dev/concepts/how-it-works
Riffle (UIST ‘23, Litt et al.) — single local relational store as truth, reactive queries on top; the academic version of what we’re building. https://riffle.systems/essays/prelude/
Tauri v2 State Management — canonical tauri::State<Mutex<T>> / RwLock<T> patterns. https://v2.tauri.app/develop/state-management/
Confluent — Single vs. Multiple Event Streams + EventSourcingDB Designing Aggregates — the steelman against unification (read before re-arguing). https://developer.confluent.io/courses/event-design/single-vs-multiple-event-streams/
Kleppmann et al. — Local-first software (Ink & Switch, 2019) — the seven ideals; addresses filesystem/Dropbox-style sync as a viable substrate. https://www.inkandswitch.com/essay/local-first/
Kleppmann — Designing Data-Intensive Applications, 2nd ed., Ch. 11 “Stream Processing” — sections “Event Sourcing” and “Databases and Streams” are the definitive framing of “log is truth, state is a derived view.”