An Executable Playbook for Your AI COO
Two months ago I described what I’d torn out. This is what I built to replace it — and why I wrote the build doc for an agent to execute, not for a human to follow.
In March I wrote about ripping the agent middleware out of my consultancy and replacing it with Claude, MCP servers, and a couple of scheduled tasks. The economics flipped. The output got better. The post landed well enough that I’ve been answering the same question in DMs ever since: fine, but can you actually show me?
For a while my answer was a polite no. The setup worked, but it lived in my head and across a half-dozen markdown files I’d never bothered to organise. Each new client conversation meant reconstructing the build from memory. Worse, half my answers turned out to be wrong by the time I’d finished saying them — the tooling moves fast enough that “the way I did it last month” is already stale.
So I sat down to write it down properly. The output is two documents that, between them, do something I haven’t seen attempted often: one is a build doc for a human to read, the other is a build doc for an agent to execute. Same architecture, different readers, no compromise from trying to make one document serve both.
This post is the first one. The second is a GitHub Gist you can paste into a Claude Code session.
What it is, in human terms
The system has nine architectural components. None of them is novel; the value is in how they fit together and where the safety floors are.
A persona spec. Four markdown files: CLAUDE.md (the operating manual — tier definitions, what to do alone, what to escalate), IDENTITY.md (who the agent is), SOUL.md (tone and decision authority), USER.md (who the user is and how to be useful to them specifically). The split matters because each file changes on a different cadence — CLAUDE.md absorbs operational learnings weekly; SOUL.md is set once; IDENTITY.md is mostly stable; USER.md changes when the user’s situation changes. Loading them via @./ includes from CLAUDE.md means the agent sees them as one cohesive context.
A channel. Exactly one inbound/outbound surface. Resist the temptation to support multiple — it sounds flexible and turns into a routing nightmare. Mine is Telegram, with the cloud-storage and jurisdictional caveats named honestly in the build doc. For sensitive workflows you’d want a custom MCP for a channel you control.
A persistent runtime. Claude Code in a tmux session, kept up by a LaunchAgent. tmux survives terminal close, SSH disconnect, lid close. The LaunchAgent handles reboot. The launch script has one critical detail: it strips environment variables that Claude Desktop leaks into spawned shells, which is the kind of detail that costs you an evening when you don’t know to look for it.
Specialist subagents. Six markdown files that the main session delegates to — research, comms, calendar, pipeline, content, ops. They run in isolated context and return summaries. The main session stays clean; each domain gets the toolset it needs.
Three hook surfaces. Project-scope deny-destructive (the safety floor — blocks deletes, dangerous Bash patterns, autonomous publishing actions); user-scope audit hooks with hash-chained tamper evidence (audit-grade with caveats, not turnkey compliance); user-scope context injection (current time, raw-memory triggers). The hooks are the strongest available enforcement under bypassPermissions mode — but they aren’t tamper-proof guarantees. They’re shell scripts with regex matchers. The build doc names exactly what they do and don’t catch, with worked examples of how to extend them.
Long-term memory as a markdown vault accessible via MCP. Mine is Basic Memory — purpose-built, Obsidian-compatible, MCP-native. Survives session restarts, reboots, model upgrades.
A scheduled cadence. This is the most user-specific part. You’ll start with one or two scheduled tasks — a morning brief is the obvious first — and grow from there as friction surfaces. Mine has accumulated to about twenty over months. Each one was added because something specific went uncaught. None of them is universal.
Plus two optional pieces — custom MCPs for tools the claude.ai connector list doesn’t cover, and remote access via cloudflare-tunnel for when you want scheduled tasks to reach memory off-Mac.
That’s the architecture. It runs in a tmux session, talks to a phone via Telegram, drafts the user’s work, maintains long-term memory across sessions, and refuses by hard policy to do the things they’ve told it never to do.
Why two documents
The natural first instinct is to write one playbook that works for both readers — explanatory enough that a human can follow it, structured enough that an agent can execute it. I tried that. It muddied both.
The marker conventions a human reader needs (clear “decide this first” callouts, narrative explanations of why each component exists) are noise to an agent that just needs the executable contract. The narrative voice that makes a human want to keep reading slows the agent down and creates space for misinterpretation. And the third-person framing that makes sense for the agent (“ask the user…”) reads as oddly distancing in prose meant for that user to read directly.
So I split them. This post is the human view: what the architecture is, why each piece exists, what it costs, what fails, and how to think about whether you should build it. The Gist is the agent’s contract: a sequence of phases with explicit interview prompts, verification steps, inline templates, and a hand-off checklist. You can read the Gist as a human if you want — it’s still markdown — but you’ll find it terser than this post and structured for execution rather than understanding.
The Gist’s contract uses four markers:
🛑 ASK USER— stop and get an input🎯 DESIGN CHOICE— interview the user about a decision; the worked example shows what I did, but their version may differ✅ VERIFY— run a command, confirm the output before moving on🛑 SUBSTITUTE— replace a literal placeholder
The third marker is the one that matters. The Gist is a pattern library plus a worked example, not a config to clone. Each phase teaches a piece of architecture, shows how I built mine (tagged [example, mine] so it’s clear), and tells the agent: ask the user how theirs should differ. The agent then implements the user’s version, not mine. This is the only honest shape, because my LinkedIn MCP and my twenty scheduled tasks and my Postiz workflow are mine because they fit my work. Your version of the same architecture will look different at almost every leaf.
What this costs and what fails
Money. Claude Max, lower tier — $100/month (around £80). I’ve never breached that tier’s rate budget with an always-on tmux session, ~20 scheduled tasks, and daily interactive use. Pro tier ($20/month) won’t have enough budget for an always-on agent; if you’re cost-sensitive, you can build the persona/subagents/hooks layer on Pro and add the always-on session when you upgrade.
Time. Budget two Saturdays, not one. There are eleven named traps in the Gist, and most readers lose an afternoon to one of them. The build is simple in principle; the surface area where the wrong CLI flag silently stays wrong is large.
Attention. The persona-spec interviews are the highest-leverage hour you’ll spend. If you skim them, the agent feels generic and you’ll abandon it. If you do them properly, the agent fits you specifically and the rest of the system pays back the time.
What fails after launch. The system has no built-in disaster recovery; the Gist’s Phase 10 covers backup and the recovery sequence. The most common operational failure I’ve had is silent channel delivery: on 2026-04-27 my Telegram outbound was broken for a day — scheduled tasks ran, wrote correctly to memory, but never reached my phone. The system reported success because the writes succeeded; the channel was the only thing wrong. The fix is a daily watchdog task that pings the channel and writes the result to memory; the diagnosis is to cross-check memory against your phone weekly. Other failure modes are named in the Gist and indexed by symptom in a troubleshooting table at the end.
Who this is for
You need three things: a Mac that stays on (laptop with the lid open and on power, or a Mac Mini), Claude Max ($100/month or higher), and a phone with a messaging app you’re willing to use as the channel. If you have all three, the build is in reach. If you don’t, nothing in the Gist will save you.
Beyond that, the audience is anyone who has been running an AI assistant out of half-organised conversations and wants to consolidate into something disciplined. The Gist assumes you’ve at least tried to do this and felt the friction — it isn’t an introduction to AI, and it isn’t a vendor pitch. It’s the recipe I’d hand a peer who already knows why they want this, with the agent-driven personalisation built in so it actually fits their work rather than copying mine.
The honest framing
I am not pretending to ship a turnkey clone of my system. I’m shipping the architecture (which is reusable), the worked example (which is mine), the interview structure that turns those into someone else’s working system (which the agent runs), and the operational realities that most playbooks omit (cost, failure modes, backup, recovery).
The Gist is opinionated and personal. That’s the point. A neutral-framework version of this would be less useful, not more — because the place this fails for people isn’t the architecture; it’s the design choices baked into the architecture being mine and not theirs. Naming that explicitly, and structuring the build around the agent asking the user where their version should differ, is the actual fix.
The Gist
build-nix.md on GitHub Gist — about 8,000 words, eleven phases plus pre-flight and verification, CC-BY-SA 4.0.
Paste it into a Claude Code session. The agent will read it, take you through pre-flight, then walk you through each phase with the design-choice interviews built in. Budget a Saturday for a first attempt; have a coffee in reach for whichever trap costs you the afternoon.
Want help building yours?
The Gist is designed to be runnable end-to-end by Claude Code with an interview-driven user. Some readers will want a human pair for the design decisions — especially the persona spec, the cadence design, and the NEVER tier. I take a small number of these as paid engagements each quarter; if it’s useful, get in touch via the contact section on the homepage.
If your build improves on the playbook in ways that matter, send the diff back. The improvements I learn from client builds fold into the next version of the Gist; the next version of this post.
Paste it in, let the agent build it with you, and tell me what broke.