n-lawOS — the build, end to end
A working engineering document: every step in build order, with the actual data model, contracts, commands, config, failure modes and the reasoning behind each choice. Two columns at a time — click anywhere in either column to switch the whole document between Lay | Dev and Customer | Compliance.
§0 · How to read — one doc, four angles
Two columns. Left = plain words, right = the technical build. Click anywhere in either column (or the bar that follows you down) and both columns switch to a different pair of readers.
Default = Lay | Dev; click → Customer | Compliance. Dev is the spine; this is a build-from-it doc — schemas, contracts, commands, env, edge cases, rationale. Order is P0→P9 (real build order). §A data model and §B repo layout are referenced throughout; §C env vars, §D services index at the end.
What the buying firm gets and what changes for them at this step — value, control, what stays in their hands.
The data-protection / regulatory consequence of this step — lawful basis, where data sits, what a regulator or auditor would check. Grounded in D3/D4. Not legal advice.
Print/PDF shows all four angles at once. Selecting text won't trigger a switch — only a plain click does.
§A · Core data model (the one Postgres spine)
Everything the firm has — clients, calls, notes, time, the record of who did what — lives in one organised filing system, with strict rules on who can open which drawer.
One Postgres database. Row-Level Security on every table from migration #1. pgvector for semantic search. Drizzle schema + SQL migrations in infra/migrations. Core tables:
Invariants baked into the schema: (1) redaction_token is readable only by the tenant role, never selected into audit_log/usage_event; (2) audit_log has no UPDATE/DELETE grant (append-only — enforced by a trigger + revoked privileges); (3) job_run.masked_in only ever holds masked text (CI check, P1.4).
One trustworthy record of the whole firm. Staff see only what their role allows. The "real names" key-sheet is a separate locked drawer that never leaves your cloud, and the activity log can never be quietly edited.
Access control at the database (RLS), not just the UI — defensible least-privilege. The re-identification key (redaction_token) is isolated and tenant-only, supporting the "de-identified in the model's hands" position (D3). Append-only audit_log = tamper-evidence. One store = one place to evidence retention, SARs and erasure.
§B · Repository & deploy layout
The whole system lives in one project folder so it's built, tested and shipped as a single installable unit. The secret "recipe" (the AI prompts) lives in a separate, private folder that never ships.
Why a monorepo: one version, one CI, one deployable; the egress-gate invariant can be checked across the whole tree. Why the Engine is separate: the shipped/escrowed image must contain no IP and no back door (D8).
You receive one installable system that runs in your cloud. The vendor's "secret sauce" isn't in it and can't be — what you run is auditable, what makes it clever is licensed.
Single deployable = one auditable boundary for a DPIA / escrow review. Engine-repo separation means the customer image is inspectable and contains no hidden data path. CI lives with the code so controls (redact-before-model, secret-scan) are evidenced on every change.
Foundations & repository
docker compose up brings the stack live on one VM; a seeded user can sign in via the firm's SSO; an empty matter can be created and is visible only to its owner (RLS proven with a second user).Create the monorepo
Set up the single project that holds the whole system, built and shipped together.
Strict TS everywhere ("strict": true, noUncheckedIndexedAccess). Shared contracts as zod schemas in packages/shared so the web app and the Python services agree on shapes (generate JSON Schema for the Python side). Engine prompts: git init a second private repo.
Nothing visible yet — but it means one clean install later, not a pile of parts to wire up.
One repo = one auditable artefact. Prompts/IP kept out of it from commit one, so escrow/handover never leaks the Engine or a back door.
App shell & routes
The window everyone works in — the screens, menus and buttons.
App Router; one route group (app) with a route per surface (see §B tree). Tenant data is client-rendered against the local sync store (P0.4), not SSR'd off-instance, so nothing renders server-side outside the tenant. A thin server layer only sets the RLS request context (P0.5) and proxies to the redactor/relay/MCP. Layout = left surface nav + top bar + content (the windows you saw in v2).
One fast app your team logs into, branded to your firm.
No tenant data leaves the instance to render; the UI executes inside the firm's deployment. Server endpoints are thin and logged.
Database, RLS & migrations
The single filing system, with locked drawers per role.
Postgres 16 + CREATE EXTENSION vector. Drizzle schema in packages/db; migrations in infra/migrations. RLS enabled on every table in the first migration — never "add security later". Each request opens a transaction and sets the actor:
Tables per §A. audit_log: revoke UPDATE/DELETE; add a trigger raising on modify. Edge case: background jobs (no human) run as a service role with its own narrow policies.
One trustworthy record; staff see only what their role allows; the activity log can't be secretly changed.
Least-privilege enforced at the data layer; append-only audit; single store simplifies retention + SAR + erasure evidence. DPIA references the RLS policy set and the audit trigger.
Local-first sync
The app keeps working offline and updates live for everyone.
ElectricSQL syncs Postgres ↔ a local store in the browser (offline-first, realtime). Define "shapes" (filtered subscriptions) per surface, e.g. matters where owner or member = me. Writes queue locally and replicate on reconnect; conflicts surface to a resolve UI (keep-mine / use-theirs / merge). Replaces Firebase/Firestore — Google-hosted, can't run in-tenant, forbidden.
No lost work — it queues offline and syncs when you reconnect; teammates' changes appear live.
Sync stays inside the tenant boundary; no third-party (e.g. Google) processor in the data path — a clean line in the DPIA.
Identity, SSO & the RLS bridge
Staff sign in with their normal work login.
Auth.js with an OIDC provider = the firm's Microsoft Entra (or Google; Keycloak self-hosted for big firms). On each request, map the session → identity.role → SET LOCAL app.user_id/app.role (the RLS bridge from P0.3). No Firebase auth. Edge cases: leaver = IdP revokes → session dies; role change propagates on next token refresh.
Single sign-on with your existing Microsoft/Google accounts — no new passwords; leavers lose access through your normal process.
Identity stays with the firm's IdP; access tied to the firm's joiner/leaver controls. SSO/SAML expected at enterprise tier; every auth event in the audit log.
Deploy bundle (in the firm's cloud)
Package the system so it installs and runs the same inside the firm's own cloud.
One .env per tenant (§C). Healthchecks + restart policies; nightly pg_dump → encrypted to R2. Single VM (e.g. Hetzner CCX) to start; Kubernetes only at scale. This single-tenant in-tenant deploy is the control behind D6.
It runs on your infrastructure — we never host your data; you hold the keys.
Single-tenant, in-tenant deployment is the core D6 control: data never leaves the firm's cloud; vendor is not a host or a joint controller of content.
The egress gate (redactor) — before any model call
POST /redact returns masked text + a tenant-only token map; low-confidence spans queue for human confirm; a measured recall figure exists on a representative transcript set; CI fails any model call that bypasses it.Detection sidecar & contract
The part that finds every personal detail in the text before anything is sent on.
Python FastAPI sidecar wrapping Presidio AnalyzerEngine (spaCy en_core_web_lg NER) + custom recognizers (P1.2), ensembled. Stateless HTTP; the only caller is the web server, never the model.
Recall > precision (β=2). Treat vanilla Presidio as a baseline — it misses PII; tuning + custom recognizers lift it materially. Evaluate with presidio-research on representative transcripts; record recall per type.
Built to catch names, numbers and IDs before anything leaves — and to ask a human when it's unsure.
Detection quality is measured, not assumed (Q2). Target anonymisation under the ICO "motivated intruder" test, not mere pseudonymisation (D3). The measured residual rate is the evidence the risk is managed; marketing must say "designed to minimise re-identification risk", never "all PII removed".
Custom UK recognizers
Extra catchers for UK things: National Insurance, NHS numbers, postcodes, sort codes, case references.
Validators (checksums/format) raise confidence so true hits clear threshold and noise stays low. Store recognizer configs in services/redactor/recognizers/; version them; re-run eval on every change.
Tuned for UK legal data, not a generic filter.
Per-recognizer recall figures are the audit evidence for residual re-identification risk; checksum validation reduces both misses and false flags.
Token map (tenant-only) · deny-by-default · human review
Each detail becomes a placeholder; the "placeholder = real value" sheet never leaves the firm. Anything uncertain goes to a person before sending.
Persist the map in redaction_token (scope = job or matter), readable only by the tenant role; never selected into audit_log/usage_event. Deny-by-default: spans with score < threshold are masked and queued (lowConfidence) for a human to confirm/relabel before the relay call proceeds. Structured fields (DOB, NINO columns) masked by rule, not just ML. Re-hydration (P3.2) reads this table firm-side only.
Real identities never leave your walls; a person checks anything uncertain before anything is sent.
The token map is the re-identification key — isolating it in-tenant is what lets the outbound text be treated as de-identified (D3). Human-in-loop on low confidence is a documented, evidenced control.
CI invariant — no model call without masked text
A built-in check that makes it impossible to wire the AI to raw text by mistake.
Plus Gitleaks (secrets) and a runtime assertion in the relay client that rejects input not tagged {masked:true}. Build fails otherwise. This is the machine form of "redactor-before-engine".
The "strip first" rule can't be skipped — the build itself enforces it.
Redactor-before-engine is a code-enforced invariant, evidenced on every commit — a strong control narrative for a DPIA/audit.
The relay + model (AI substrate)
The relay — key holder + meter (in-tenant)
A small piece of our software, inside the firm, that talks to the AI on the firm's behalf and counts what's used.
LiteLLM proxy runs in-tenant; holds an n-law-issued, scoped, short-lived key (rotated via heartbeat); per-tenant spend caps. Customers can't bring their own key; the key is never loose in app code. The success callback writes usage_event (tokens, tier, job kind — no content).
You never handle AI keys; usage is metered transparently and billed to you; a runaway can't blow the budget (per-tenant cap).
Only de-identified text + counts leave; the relay is in-tenant. Numbers-only telemetry is both the PII-free-telemetry control and the billing meter. Key rotation + caps limit blast radius if a key leaks.
The model — zero-retention, UK/EU, tiered
The actual AI brain — chosen so it keeps nothing and sits in the UK/EU; a cheaper one for simple jobs, a top one for hard drafting.
Live: Azure OpenAI UK South (DataZone EU, ZDR on approved access) for junior/paralegal; Anthropic ZDR for associate. Dev/synthetic only: OpenRouter (never live). Tier is chosen by the job (P3). Confirm ZDR eligibility + region pinning on the actual accounts before go-live.
Your (already de-identified) text goes to a named provider under a no-keep, no-train contract in the UK/EU — not a black box.
Gateway treated as a processor under a DPA: zero-retention, no-training, UK/EU residency, no-re-identification clause (D3). OpenRouter never on the live path. Provider swappable as law/guidance moves.
MCP tool layer + model-swap
The wiring that lets the AI use each part of the app, and lets us change AI provider without rebuilding.
Each surface exposes an MCP server (packages/mcp) with typed tools, e.g. matter.read, note.draft, time.log, conflict.check. Subagents act only through MCP — never raw DB. A thin askModel(tier, messages) interface makes the provider a one-line config switch.
The AI helps across the whole app, and you're never locked to one AI vendor.
Two hard chokepoints — MCP for tools, redaction→relay for models — make the data flow auditable and the provider replaceable.
The first job — attendance note (walking skeleton closes)
Job runner + the job contract
The machinery that runs one defined AI task and produces one finished thing.
Worker pulls from pg-boss → calls askModel via the relay on maskedText → validates output against the zod schema (retry on mismatch) → writes job_run + a draft note/time_entry (placeholders still in). Bounded jobs (not chat) keep the AI inside back-office support.
Pick "draft attendance note" → get a finished draft + a time entry. Each job is a defined task, like giving work to a paralegal.
Bounded, named jobs (not freeform advice) keep the AI on the lawful side of the reserved-activity line (D4); each new job kind gets a regulatory check (Q1).
Re-hydrate · review · sign · edit-rate · audit
The placeholders are swapped back to real names (only on the firm's side), a person reads, edits and signs; nothing is final until they do; every step is recorded permanently.
State machine draft → review → signed. Edit-distance is captured per job kind (D10.4) — the running quality + honesty check on the time-equivalent. One write = provenance + audit + bill line.
A fee-earner always has the final say; there's a clear, unchangeable record of who did what.
Human sign-off + append-only audit = accountability + tamper-evidence. Edit-rate evidences output quality and keeps billed time defensible.
Channels — the doors into the pipeline
Calls, recording & transcription
Built-in calls that record automatically (with a notice), then turn into text on the firm's own machine.
LiveKit (self-hosted WebRTC SFU) + egress recording → object store; a webhook fires recording.ended {uri, matterHint}. A pg-boss job runs faster-whisper (in-tenant, batch) → transcript. Audio never leaves the tenant. Legacy phone lines: ingest existing call-recording files (bridge), or managed SIP via Twilio if a number is needed. Edge cases: diarisation for who-said-what; partials discarded; failed transcribe → retry then flag.
Calls are captured without anyone remembering to record; even the audio stays inside your walls.
Recorded-call notice + lawful basis (legitimate interests + a documented LIA); recording + transcription stay in-tenant; ICO Jan-2026 — disclose AI-assisted analysis in the privacy notice; retention auto-delete on recording.retention_until.
Email, internal calls & matter-matching
Emails and internal staff calls flow the same way, and the system works out which client/case each one belongs to.
Microsoft Graph: a /subscriptions webhook on the mailbox (delta query for new mail) + Teams call recordings; validate the webhook token; emails skip transcription. Each door normalises to {text, source, matterId?, participants, ts}. Matter-matching: rules first (caller number → party → matter; email sender/thread-id → matter), then a small MCP subagent for fuzzy cases; the matter list comes from the firm's practice-management system (Clio/LEAP) via API. Unmatched → an intake queue.
Email and internal calls get the same notes + time automatically, filed to the right matter.
Email is already a record; internal calls need a staff notice. The same egress gate applies to every door. Correct matter attribution underpins confidentiality and accurate records.
Native surfaces — bring online in order
The surfaces
The everyday screens: the timesheet, each matter's story, the inbox, the task board, the firm's saved know-how, and the partner's overview.
- Time-ledger — query over
time_entrygrouped by actor/tier/day; the billing source. - Matter timeline — projection over
audit_log+ events for one matter (event-sourced view). - Inbox — Graph mirror; each message matched to a matter + an AI action chip.
- Tasks/workflow — board schema (status flow + approval gates); MCP tool
task.create/move. - Knowledge — embeddings in
memory.embedding;ORDER BY embedding <=> query(pgvector) for semantic search of precedents. - Dashboards — materialised views refreshed on a schedule (matters active, time logged, needs-you).
Canvas/co-edit on tldraw + Yjs. Every surface exposes an MCP interface so subagents act on it.
Your whole operation in one place — and timekeeping becomes automatic. You can keep using Outlook/your document system during the move; they import and run alongside, then retire.
One memory/audit/identity across surfaces = consistent retention, access and SAR handling. Bridges import-then-retire — no permanent third-party path in the steady state.
Billing — the virtual fee-earner
usage_event rolls up into AI-hours by tier, priced by the rate card, and produces a Stripe invoice that reads as a headcount line; the firm never sees tokens.Meter → time → bill
The AI's work is turned into time (6-minute units) and billed like a member of staff.
Tokens stay internal cost + spend-cap; the firm's statement shows hours by tier. Stripe metered/invoice-items; monthly cron rollup.
A bill that reads "AI fee-earners: N hours" — a headcount line, not a software bill, and far below a human rate.
Minutes-per-job must be defensible + shown transparently (or it reads as padding). Whether AI time is on-billable to the client is the firm's SRA/costs decision, not the vendor's.
Legal-gap tools (each a new billable job)
The legal-only surfaces
Tools a general office system doesn't have: checking new clients don't clash, tracking legal deadlines, onboarding + engagement letters, ID/AML checks, and a library of standard documents.
- Conflicts — embed the new party,
ORDER BY embedding <=> newoverparty; threshold → clear/near/conflict; near → human. - Key-dates — rules engine producing flags (never advice); writes reminders to tasks.
- Intake + engagement — form → create
matter+ run anengagement_letterjob. - AML/ID — call a checks provider (e.g. Onfido/ComplyAdvantage) API; store the result, not the raw documents beyond retention.
- Precedents — template store in
memory(scope=firm), pgvector search.
The legal-specific chores handled in the same place, each saving real time.
Key-dates flag (never advise) to stay back-office; AML supports the firm's own MLR obligations; conflicts support confidentiality. Each new job gets a regulatory check (Q1) — none may cross into reserved activity (D4).
Harden & ship (before the first real firm)
Tests, monitoring, backups, retention
Make sure it works end-to-end, watch for crashes, never lose data, and auto-delete old recordings on schedule.
Jest (unit) + Playwright E2E (call → transcript → redact → draft → sign → audit); a dedicated test asserts the relay received only masked text (the invariant, behaviourally). GlitchTip/Sentry for errors with a PII scrubber; OpenTelemetry counts only. Backups: pg_dump + object-store snapshot, encrypted to R2, restore-tested. Retention: nightly cron deletes where retention_until < now().
Reliable, with old recordings auto-deleted on a schedule you set, and backups you can restore.
Retention/auto-delete + PII-free telemetry are explicit DP controls; the E2E test is positive evidence of the redact-before-model path; backup encryption + restore-test for resilience.
Certs, image signing, first deploy
Get the basic UK security badge, lock the build so only approved code runs, then install for the first firm.
Cyber Essentials (self-assessment, ~£300–500, early). Cosign/Sigstore sign every image in CI; Watchtower applies only signed images (controlled promote). Deploy: docker compose pull && up -d on the firm's VM with their .env (§C). ISO 27001 deferred to P9.
Recognised security baseline; only approved versions can ever run; installed into your cloud.
Cyber Essentials is the entry baseline UK firms ask for; signed images support handover/escrow trust (only an approved build runs). ISO 27001 when enterprise demand justifies it.
Enterprise (only after a signed LOI)
Multi-tenancy + ISO 27001
Only once a big firm commits: support many offices in one install, and get the bigger security certificate.
Add a tenant_id to every table + a tenant RLS policy layered under the existing role/matter policies; per-tenant LiteLLM keys/quotas; per-tenant config/branding. Do not build before an LOI. Swap pg-boss → BullMQ/Valkey if throughput demands.
Scales to a large multi-office firm when needed.
Tenant isolation by RLS; ISO 27001 (~£10–40k, 6–12 mo) for £100k+ RFPs — start only when the pipeline justifies it.
§C · Environment variables (per tenant)
§D · Services & repos — master index
Canonical project/service links. Licences as commonly understood mid-2026 — confirm before commercial use (esp. flagged).
Source of record: task_plan.md (D1–D12) + STUDY.md + findings.md. Build doc v3 (full-detail) · 13 Jun 2026 · time-ordered P0–P9 · 4 reader angles, click any column to switch · code/schemas are reference sketches for build, not final source · links canonical, not re-checked this build · not legal advice — DP + regulatory posture needs lawyer sign-off before any real client data.