Nine Days, Four Prototypes, One AI Development Governance Framework

Four local-first apps. Nine days. Three bugs the AI wrote confidently, and I accepted. The case study and governance framework—what breaks when you vibe-code local-first, and what prevents it.

Share
Cover: "Nine Days, Four Prototypes, One Framework"—four local-first prototypes, three failures, one AI-assisted development governance framework.
Nine days. Four working prototypes. Three failures that taught me how to govern this kind of work. The framework held, and this morning I submitted the talk.

What Happened After the Tracker Went Live

A local-first e-commerce store. A patient intake form that writes before it posts. A social network where the server forgets you on purpose. And an honest account of what broke, what the AI wrote that shouldn't have shipped, and what I learned about governing this kind of work.


The Governance Window Tracker launched on April 19. infinitydrive.net forwarded to it. The domain that had been held since the mid-2000's finally resolved to something: a live monitoring instrument tracking whether the window for binding democratic AI governance was narrowing or widening. That article closed there: personal arc resolved, analytical arc open.

What I didn't write about was what I was already building the day after.

Nine days later, today, April 28, four working prototypes are live, and I have an AI development governance framework for AI-assisted development that I wish I'd had at the start. This piece covers everything: what was built, what broke, the AI's role in the failures, and what I've taken from it going forward.


The Series, and Why It Exists

The local-first prototype series started as a demonstration problem. The Governance Window Tracker is a read-only civic intelligence tool; it has no servers, databases, or backends. The user's browser is the application. That's an easy domain for local-first architecture. Nobody needs to pay for anything or submit a medical record.

The harder question, and the one the Tracker's architectural argument doesn't answer, is whether local-first works when something irreducibly server-dependent has to happen. When money has to move. When a clinical record has to reach a provider. When two people who don't share a server need to find each other and exchange data.

The seam is my name for that boundary: the minimum server-dependent surface in an otherwise local-first system. Identifying the seam, designing around it, and making it explicit rather than accidental; that's the architectural argument the series is trying to demonstrate, domain by domain.

Each prototype introduces a harder version of the seam problem. The Tracker has no seam at all. checkout-seam has one seam per transaction. fhir-seam has one seam per intake submission, with a harder failure taxonomy and higher stakes. Local-First Social (localfirst.social for the agnostic platform | socialpings.com for the branded product experience, perhaps someday...) has a seam that fires whenever a new connection is made. The social graph itself is a distributed seam.

The series is also, more honestly, a demonstration of how much working software two resources can produce in a short time when one of them is an AI, and the other is a single human directing the work with enough clarity about the architectural argument to know when the output is right and when it isn't.


checkout-seam: Local-First Commerce

Live: checkout-seam.vercel.app · Repo: github.com/jediwright/checkout-seam

The first prototype, after the Tracker, takes an obvious target: e-commerce, and explores the payment problem. Demo apps don't sell things. Production e-commerce platforms are entirely server-side because that's how Stripe, Shopify, and every adjacent tool assumes you'll build. The local-first community has produced beautiful work on documents, collaboration, and knowledge management. Commerce is a gap, as far as I have seen.

checkout-seam closes that gap with a deliberate structural argument:

client (Y.js/IndexedDB) ──POST──▶ server (Stripe) ──response──▶ client (Y.js/IndexedDB)

The server is stateless. It processes the charge and returns. The client owns the order record, written to Y.js on success. If the POST fails, the cart is preserved. The server is never consulted again after order confirmation.

The feature set is loosely based on my old DistinctiveFabric.com startup, a specialty fabric store I helped design and run in 2004: virtual cutting table, color-aware search, volume discount tiers, and order history. All of it runs locally in the browser. IndexedDB holds the catalog, theexchanging clinical datacart, the customer profile, and the order history. There is no server-side session, no Redux, no React context. One Y.js document, persisted to IndexedDB, is the entire application state.

The Virtual Cutting Table, a drag-and-drop fabric layout tool where cart items are colored tiles on a cutting mat, explores Y.js CRDT positions for spatial layout. The positions persist across tab close and browser restart. The architecture is sync-ready without a sync layer; add a Y.js provider, and two users share a cutting session in real time without changing the component code.

The pattern this prototype contributes to the Pattern Commons: the checkout seam. Identify the minimum server-dependent surface. Scope it explicitly. Design the error state so the client loses nothing in the event of failure. Write the result back to the local state on success. The server remains stateless. This pattern applies anywhere a local-first application touches an irreducibly server-dependent operation: payment processing, identity verification, legal record creation, compliance logging, etc.


fhir-seam: When the Stakes Go Up

Live: fhir-seam.vercel.app · Repo: github.com/jediwright/fhir-seam

Commerce and healthcare have the same seam problem. They have different consequences when the seam fails.

In commerce, failure means try again. In healthcare, failure means a clinical record was not received by the provider.

fhir-seam is a local-first patient intake form with a FHIR R4 mock endpoint as the seam. FHIR (Fast Healthcare Interoperability Resources) is the standard format for exchanging clinical data. A real patient intake system would translate the form data into a FHIR bundle and POST it to the EHR system. fhir-seam does exactly that, against a mock endpoint, to demonstrate that the pattern holds in a regulated, high-stakes domain, not just a fabric store.

Three things make this seam harder than checkout-seam:

Write-before-POST.
The FHIR bundle is written to IndexedDB before the network request fires. The patient cannot lose their intake data to a failed POST. This is a design discipline that commerce doesn't require, but healthcare does—a patient who loses their form to a network error and has to start over in a clinical context is a different problem than an abandoned cart.

Format translation.
Local state must be translated into FHIR R4 (Patient resource + QuestionnaireResponse) before crossing the seam. The client owns the native format; the server-side system speaks a standardized format. The translation happens at the seam boundary, not inside either system.

Richer failure taxonomy.
The checkout seam has two states: success and try again. The healthcare seam has four codes, each with a different clinical meaning:

  1. 200 (accepted)
  2. 422 (validation error—the bundle was malformed; retryable with correction)
  3. 503 (transient system error—retryable without change)
  4. 500 (permanent failure—contact the clinic directly)

The UI for each state uses clinical, not technical, language. A patient should not see a 503 status code.


Local-First Social: The Hardest Version

Live: localfirst.social · Relay: local-first-social-relay.fly.dev · Repo: github.com/jediwright/local-first-social-network

The KGM v. Meta Platforms, Inc. verdict came down on March 25, 2026: the first jury finding of design-based liability for deliberately addictive platform features. The feed model, engagement optimization, and infinite scroll: the design of incumbent networks was ruled to have been optimized against its users.

localfirst.social is built for the constituency that now understands this.

The architectural argument:

You own your social graph. The network is the byproduct, not the product.

Most social networks are built server-first: your content, connections, and history live on their infrastructure, optimizable for their revenue model. Local-First Social inverts the architecture. All states, profiles, contacts, messages, and the trust graph live in IndexedDB on the user's device. A minimal WebSocket relay facilitates connection and then exits. After the handshake, the relay is no longer in the path.

The social primitive is the ping, a low-friction intentional signal, not a post for broadcast. Five types: here (presence), check-this (share), thinking-of-you (maintenance), let's-connect (invitation to escalate), status (current state). Pings are ephemeral by design. They expire. The only thing that persists is the pattern, stored locally.

The relay architecture:

client A ──handshake──▶ relay (stateless) ──handshake──▶ client B

The relay stores no content. It owns no relationships. It facilitates the CRDT merge on the first connection and exits. The social graph is built from the accumulation of these distributed seams. Each one fires once, and then the two clients communicate directly.

The trust graph, who can ping you, what types, who gets thread access, inherits from the infinityDrive permission architecture. The can_access() logic that Adam Wiggins and Orion Henry built in 2004 to control per-user, per-operation WebDAV access runs conceptually at the core of Local-First Social's permission model, translated twenty years forward into a social context.

The relay exits. The trust graph lives in Y.js IndexedDB. That's excellent for this device. But change devices, and the data doesn't follow you, not without building the sync layer yourself. The durable, portable version of this argument has a name: the Solid Project. Tim Berners-Lee has been building it since 2016. The seam problem and the Pod problem are the same problem at different layers.

Localfirst.social's Phase 5 is functionally complete. Real-time bidirectional messaging between users confirmed working as of April 28, 2026.


What Broke, and the AI's Role In It

This is the part some case studies might skip. I'm not going to.

Three specific failures, all from the build sessions, all with AI-written code at their center.

The devDependencies error.
In the checkout-seam, the Stripe npm package was classified as a devDependency rather than a dependency. Local development worked fine: the local environment includes dev dependencies by default. The error surfaced only when deployed to Vercel, where the serverless function couldn't import Stripe, and the checkout broke completely. This was a basic packaging error. The AI wrote it. I accepted it. Neither of us flagged it before deploy because there was no pre-deploy checklist asking the question: do all packages required by production serverless functions appear in dependencies?

The stale-reference observer bug.
The cart badge didn't clear after a successful checkout. Thirteen minutes of diagnostic work later, the root cause: useCartItems had attached an array observer to the initial Y.Array reference, which became stale after IndexedDB persistence sync replaced the array on startup. This is a known Y.js gotcha. The fix, an attachArrayObserver() The pattern that re-attaches whenever the parent map changes is now documented and applied proactively in every subsequent prototype. But it wasn't applied proactively here because the build session didn't begin with a written schema for how Y.Arrays nested inside Y.Maps should be observed. The convention was established reactively, by debugging, not proactively, by design.

The @ prefix normalization bug.
In Local-First Social Phase 5, a single inconsistency, relay routing messages needed @handle, trust graph keys needed bare handle without the @, had accumulated across eight files over multiple sessions. The CRDT update handler was outside the switch statement and unreachable. A thread key was generated as @bob:jediwright:jediwright instead of @bob. A session that should have been polish and deploy became several hours of tracing message flow and applying normalization fixes across the codebase.

The honest account of why this happened: no authoritative convention document existed before the relay and CRDT code was written. Each session's AI instance wrote code consistent with the conventions visible in its own context. The conventions weren't consistent across sessions because nothing required them to be. Every inconsistent line of code was written by an AI acting in good faith on the information it had. The failure was structural, a missing spec, but the AI instances could have flagged the developing inconsistency if they'd been prompted to cross-check new code against a canonical document. They weren't, because the document didn't exist.

The pattern across all three failures is the same: the AI is a capable, fast implementer whose outputs require verification against specifications and conventions that the human is responsible for establishing. Where those specifications existed and were enforced, the AI's outputs were reliable. Where they didn't exist or weren't enforced, the AI filled the gap with plausible outputs that were sometimes wrong in ways that weren't visible until runtime.

This is not a case for reducing reliance on AI assistance. It is a case for being clear about the division of labor: the AI implements; the human specifies, verifies, and tests.


What I've Put In Place

The failures above produced a governance framework that now governs every build session in this series. I'm publishing it because I think it's more useful as a public document than as an internal checklist. And a useful reminder to always think of first principles, practices, and the like before jumping too far into the deep end.

The core principle.
This should go without saying, but...write the specification before writing the damned code (I should have known better as a long-time IA, Content Strategist, UX Designer, etc.) This applies at every level: the data convention document before the first hook, the acceptance criteria before the first build session, the failure taxonomy before the seam implementation, and the adversarial test plan before the deploy.

The state convention document.
Before writing any code that reads or writes Y.js state, a document must exist that specifies: map names and value types, key formats (including prefix conventions, @handle vs. bare handle), which mutations use doc.transact(), which hooks require attachArrayObserver(), which keys are relay-routing keys vs. local-graph keys. This document is a project artifact, not a session artifact. It carries forward into every subsequent session and is the first thing the Claude instance reads after the handoff.

The attachArrayObserver() rule.
Every hook that observes a Y.Array nested inside a Y.Map must apply this pattern. No exceptions. The stale-reference bug cost 13 minutes in checkout-seam. The pattern is now in the kickoff prompt for every prototype session: it gets applied from the start, not discovered in debugging.

The pre-deploy checklist.
Before every deploy: verify all packages used by serverless functions are in dependencies, not devDependencies. Verify all environment variables are set in the deployment target. Run the serverless endpoint directly via curl before declaring success.

Acceptance criteria written before code generation.
Every session opens with testable conditions, "given X, when Y, then Z," not a feature list. The session is not done until all criteria are met and verified by the human, not by AI self-report.

The diagnostic protocol.
When a bug appears: write the hypothesis before generating a fix. Partition what is known from what is inferred from what is guessed. Make the minimum change that confirms or disconfirms the hypothesis. Document the root cause, not just the fix.

External user testing before phase closure.
Any phase that involves network behavior, relay, CRDT sync, or WebRTC requires confirmation with a user outside the local network before the phase is marked complete. Two local browsers in a Codespace are not the same test.

The full framework will be published as a standalone document alongside this article soon. What I want to say here is just the meta-observation: these disciplines are not new. They are standard engineering practice. The reason they require explicit articulation in AI-assisted development is that AI assistance creates a specific pressure against them: the pace is fast, the output looks correct, the confidence is consistent, and the temptation to accept working output without verifying it against a specification is constant. The governance framework is what counters that pressure.


The Pattern Commons

Six patterns have now been documented across this series, each designed as a reusable template for other builders:

1. The checkout seam.
Minimum server-dependent surface for a payment operation. Client preserves state on failure; writes order record on success. The server is stateless and never consulted again after confirmation.

2. The high-stakes seam.
Write-before-POST discipline for operations where data loss is clinically or legally consequential. Richer failure taxonomy. Format translation at the seam boundary.

3. The profile map as local CRM.
Y.js documents the user's full relationship with a service, including address, order history, intake history, and a trust graph, all local and sync-capable as an opt-in enhancement.

4. The attachArrayObserver() pattern.
How to correctly observe a Y.Array nested inside a Y.Map when the document hydrates from IndexedDB. Prevents the stale-reference bug. Applies to any hook in this pattern.

5. The distributed seam.
Where the server-dependent operation is a peer handshake rather than a server transaction. The relay facilitates connection and exits. The social graph is built from the accumulation of distributed seams, each of which fires once.

6. CRDT as trust graph.
Trust tier assignments, connection history, and sync status are stored as local-first Y.Map state, synchronized via the distributed seam. No server owns the relationships.

7. The employment seam.
The boundary event when a worker enters or exits an employer–worker relationship.

The architectural argument is that the worker owns a durable substrate that travels with them; the platform facilitates handoffs and exits; and the legal record produced at the seam: tamper-evident, contemporaneous, multi-perspective, is the irreducibly bilateral artifact that gives the pattern its evidentiary value. The failure taxonomy is broader than the prior seams (seven states, including the account-preempted state, which conventional HR can handle sub-optimally). The pattern is buyer-agnostic by design: neither worker-primary nor employer-primary, with funding mechanisms that bias the platform toward neither side. Unlike the first six, #7 will likely be published as a specification rather than a working prototype as the architectural reference, not the implementation. A separate, longer treatment explores the continuous-state framing (re-engagement and boomerang as architecturally privileged rather than edge cases), the layered participant model (unions, attorneys, regulators, deferred parties), and what it would mean to design post-employment infrastructure at the labor-system scale rather than the product scale. That work is forthcoming.

These patterns are domain-agnostic. The checkout seam applies to legal record creation and compliance logging, not just payment processing. The high-stakes seam applies to government benefit submissions and regulatory filings, not just healthcare. The distributed seam applies to any peer-to-peer application in which a minimal relay facilitates connections without accumulating relationship data. The employment seam applies wherever a relationship between parties has a legally consequential transition that produces records consulted by parties not present at the moment the seam fires, which is most consequential transitions in most domains, once you start looking.


The Thread That Runs Through All of It

The "From Skill to Instrument" article ended with this observation: a system built to hold data without you present, now running inside an instrument built to monitor governance without a governance body present. The continuity is not metaphorical.

The continuity has extended further than I expected in nine days. The permission architecture from 2004, Adam Wiggins and Orion Henry's can_access() function, is now running in a live social network. The Virtual Cutting Table from DistinctiveFabric.com, a 2004 fabric store, is now a pattern commons entry for Y.js spatial layout. The instinct that produced both systems that hold without you, experiences that adapt to users who weren't in the room, is the same instinct my Agentic Accountability Playbook calls the absent-instructor problem.

The Local First Conference CFP closes May 1. The talk I'm submitting isn't a demo of Local-First Social. It's an account of what building all four of these in nine days with AI assistance actually looked like: what broke, who wrote the code that broke, and what a governance framework for this kind of work requires. The prototypes are the case study. The governance framework is the talk.

The Tracker continues to run. The April 5 assessment returned Narrowing, approaching Critical. The next quarterly assessment is due in early July. The window is still open. The clock is still running.


Prototypes in the series:

The governance framework is documented at:
github.com/jediwright/local-first-series

The Pattern Commons entries are documented at: github.com/jediwright/local-first-series


Systems of Thought is published by UX Minds, LLC. Methodology disclosure: this publication uses AI-collaborative methods consistent with the transparency standards it advocates. Intellectual direction and authorial responsibility are held by the human author.