The Watchlog

Notes from the watch.

Dispatches on running AI systems that answer to humans. Post-mortems, patterns, and the occasional obituary for a demo. One a month, no more.

Why demos die in Q1

The prototype that impressed the boardroom in November rarely survives contact with February. A post-mortem pattern from three years of watching other people's automation — and what it says about how this work is sold.

Read the dispatch

Every automation funeral we have attended followed the same liturgy. A demo in autumn, applause, a signed statement of work. Go-live before the holidays. And then, sometime in the first quarter, silence — not a crash, nothing so honest as a crash. The workflow simply stopped being trusted, then stopped being used, then stopped.

The cause is almost never the model. It is the gap between the conditions a demo is built for and the conditions a business runs on. The demo saw ten clean documents in a screen share. Production sent ten thousand messy ones at 2 a.m. — scanned sideways, half in French, with a supplier who changed their invoice template the week after handoff. A system optimised for the meeting was never going to survive the workload.

There is a structural reason this keeps happening. The seller's incentives end at delivery. Everything that determines whether the system lives — drift, edge cases, upstream API changes, the slow decay of assumptions — happens after the invoice clears. Nobody is paid to be present when reality files its objections, so nobody is.

The fix is not a better demo. It is a different definition of done. A system is not done when it works in the meeting; it is done when it has met a full quarter of reality with someone standing watch — measuring, correcting, and writing down every lesson. That is why every ward we ship carries a ninety-day watch, and why the watch is not an upsell. It is the part of the engagement where the system actually becomes real.

Demos die in Q1 because Q1 is the first time anyone tells them the truth. Build for the truth from day one, and February holds no surprises.

The handoff cliff

The most dangerous day in an automation's life is the day its builder gets paid. What falls off the cliff, in what order, and the paperwork that stops the fall.

Read the dispatch

Ask an owner what they received at handoff and you will usually hear: a login, a zip file, and a goodwill promise of "just email us if anything breaks." Ask them what the system was doing four months later and you will hear something quieter.

The fall has a sequence. First the small drifts — a model update changes tone, an API deprecates a field, a spreadsheet column moves. Each one is trivial; none is noticed. Then the workarounds: a staff member starts double-checking the system's output "just to be safe," which quietly doubles the cost of the automation. Then the distrust becomes policy. The system still runs, but nobody acts on it unattended. By the time someone asks whether it should be switched off, it effectively has been.

What makes the cliff a cliff is not incompetence. It is that the knowledge of how the system works left the building with the builder. There was no audit trail to reconstruct decisions, no executable gates to catch the drift, no written record of the edge cases already survived. The system's memory lived in a contractor's head, and the contractor moved on.

The remedy is unfashionable: paperwork, done properly. A signed document that states what was built, what it does, its limits, what it must never do, and who still answers for it — each claim pointing to evidence that survives staff turnover on both sides. We call ours the Warrant of Handoff. The name matters less than the discipline: a handoff is not a transfer of files. It is a transfer of accountability, and accountability needs a signature.

Autonomy is earned, not shipped

No one hands a new employee the company chequebook on day one. The case for treating machine autonomy the way we treat human authority — as a privilege with an evidence trail.

Read the dispatch

When a company hires a promising person, it does not grant them signing authority in week one. They draft, someone reviews. They propose, someone approves. Authority expands as the record justifies it — and everyone considers this ordinary prudence, not an insult to the new hire.

Then the same company buys an AI system and switches it on at full authority, because the demo was impressive. This is the single decision we spend the most time talking clients out of.

Autonomy is not a launch feature. It is a privilege earned in production, against evidence. In our engagements, every system begins fully supervised: it drafts, a human sends; it flags, a human decides. Its authority expands only when the gate record supports expansion — this class of decision, at this accuracy, over this many live runs, with these failure modes documented and gated against. The expansion is staged, measured, and written down. So is the retreat, if the record turns.

The objection is always speed: does supervision not defeat the point of automation? In practice, no. A system that drafts in seconds and waits for a human signature is already transformative — most of the value arrives before any autonomy is granted at all. What supervision costs in throughput it repays in something scarcer: the confidence to leave the system running when nobody is watching the watcher.

And for some work the answer is permanent. Money movement, in our practice, is advisory-only forever — the system surfaces, a human signs, and no accumulation of green gates changes that. Some authority should never be delegated, and knowing which is the whole discipline.

What a governance report actually contains

Most "AI governance" documents are legal insulation. A working governance report is stranger and more useful: a monthly account of what your system did, refused to do, and learned. Here is the table of contents.

Read the dispatch

Ask a vendor for their AI governance documentation and you will typically receive a policy: principles, commitments, a diagram with the word "oversight" in it. Policies describe intentions. A governance report describes events. Owners need the second one, and almost nobody writes it.

Ours runs a few pages, monthly, in plain English. First, volumes: what the system handled this month — runs completed, documents processed, decisions drafted — against the previous month, so drift in workload is visible before it becomes drift in behavior.

Second, and most telling, refusals: what the system declined to do. A healthy governed system says "I am not sure" regularly, and each refusal is listed with its reason. A report with zero refusals does not mean a perfect system; it usually means an overconfident one. This section is where owners learn to read their automation's character.

Third, escalations: every case handed to a human, how long the human took, and what they decided — because escalation paths that are never exercised are decorative, and ones that are exercised constantly are a design flaw wearing a safety costume.

Fourth, incidents and lessons: what surprised the system, what it cost, and what changed as a result — ideally a new gate, so the same surprise can never arrive twice. Fifth, the gate record itself: how many checks ran, how many stayed green, and what any red one meant.

The test of the document is simple. Hand it to a board member with no technical background and ask them two questions: do you understand what this system did last month, and do you know who answers for it? If both answers are yes, the system is governed. Everything else is stationery.

One dispatch a month. No more.

The Watchlog arrives monthly — post-mortems, patterns, and lessons from systems under ward. Subscribe in the footer below, or begin with an audit.

Book a Watch Audit