The Core
Flywheel
Three tools. One loop. Most of the value. Learn the beginner-friendly core of the Agentic Coding Flywheel: Agent Mail for coordination, br for task structure, and bv for intelligent routing.
Why a Simpler Starting Point
The full Flywheel system has grown large enough that many people find it overwhelming on first contact. That reaction makes sense. But there is a much smaller core that already captures most of what makes the approach powerful.
The larger system includes planning workflows, memory systems, prompt libraries, launch tooling, safety tooling, skills, and a lot of accumulated operational detail. You do not need to absorb all of that up front.
The core loop uses just three tools. If you understand those three and use them together correctly, you already have the heart of the system.
Separate the process into two layers: the planning substrate (frontier models used to create and refine the markdown plan) and the core operating loop (Agent Mail, br, and bv once the plan is ready to drive execution).
Who This Is For
This document is for a relatively smart software developer who is new to agentic coding and does not want to absorb the entire larger Flywheel guide up front. The goal is narrower: get you to the point where you can coordinate multiple agents without chaos, keep work organized as explicit tasks with dependencies, and keep agents working on the best next unblocked task instead of choosing randomly.
If that works well for you, the larger Flywheel stack becomes much easier to appreciate later.
Five Terms You Need
If these five terms stay clear in your head, most of the rest of the guide gets much easier to follow.
How the Tools Work Together (Behind the Scenes)
You do not need to manually manage the coordination between these tools. When your AGENTS.md file is set up correctly, the agents handle the integration automatically: they use bead IDs as thread identifiers in Agent Mail, they announce claims and reserve files before editing, and they update bead status as they work. You configure this once in AGENTS.md and then the agents just do it.
The Core Loop
The core loop is simple: generate a plan, encode it as beads, launch agents with marching orders, let them coordinate through Agent Mail while bv routes them toward the best next bead, and tend the swarm until the graph is done.
Six stages, one loop, compounding leverage
Each closed bead reshapes the graph. The next agent gets a better map. Click any stage to see the details.
Plan
Create & refine markdown plan with multiple frontier models
Markdown Plan
Ask 3+ frontier models, synthesize
Planning tokens are far cheaper than implementation tokens
Normal Chat Coding vs. The Core Loop
The core loop moves work out of ephemeral chat and into explicit, inspectable artifacts. That is the short answer to "why bother?"
Why the tools matter
Four agents, six tasks, side by side. One side uses the core loop, the other does not. Press Start.
All agents boot up and read the codebase
Agents
Tasks
All agents boot up and read the codebase
Agents
Tasks
Four agents spin up in parallel and read the same codebase. Without coordination tools, they have no way to know what the others are doing.
The Three Tools Are a Single Machine
These three tools solve three different failure modes. Each helps on its own, but the value shows up most clearly when they form a stable loop together.
Agent Mail Solves Coordination
Without Agent Mail, multiple agents constantly collide: two agents edit the same files, nobody knows who is doing what, messages disappear into chat history, and work gets stranded when an agent crashes.
Agent Mail gives agents a shared coordination layer with identities, threads, inboxes, and file reservations. Agents announce what they are doing, reserve edit surfaces, and recover when another agent disappears. All of this happens automatically once your AGENTS.md tells agents to use Agent Mail.
br Solves Task Structure
Without br, work collapses into vague conversational intentions: "fix the auth stuff," "clean up the admin area," "someone should improve tests." That kind of tasking is too fuzzy for a swarm.
br turns work into explicit beads with status, priority, and dependencies. Once work is represented that way, multiple agents can make progress without constant human steering.
bv Solves Routing
Even with good beads, agents still need to know what to do next. Without bv, they choose work based on local convenience or whatever they most recently saw in context.
bv reads the bead graph and computes what is most worth doing next. That turns the swarm from "many agents doing work" into "many agents pushing the project forward efficiently."
Beads, Agent Mail, and bv are a single machine
Hover or tap to inspect each piece. Click again to remove it and watch the system lose a capability it cannot replace. This is the Coordination Triangle.
The high-bandwidth negotiation layer.
The durable, localized issue state.
The graph-theory compass for triage.
The trio is not three nice-to-have tools. It is one operating system split into memory, communication, and leverage analysis. Remove any side of the triangle and the swarm loses determinism.
What Goes Wrong If You Skip One
The Artifact Ladder
One reason agentic coding feels confusing at first is that the active artifact keeps changing. The easiest way to stay oriented is to know what the current artifact means and what you do with it next.
What you produce at each stage
An idea becomes a plan, a plan becomes beads, beads become finished code. Click any stage to see the artifact, what it means, and your next move.
Raw Idea
A rough description
You know the goal, not the system.
Turn it into a serious markdown plan.
Raw idea to finished code. Each stage adds precision.
Prose becomes executable memory. This is where agents stop guessing.
Each completed bead reshapes the graph and unblocks new work. Finished beads create ready beads.
Plan Space, Bead Space, and Code Space
Plan space is where you decide the workflows, constraints, architecture, and testing expectations. Bead space is where you transform that thinking into executable memory for agents. Code space is where agents implement the local task that a bead defines.
The general rule is simple. Debates belong in plan space. Translation and dependency shaping belong in bead space. Implementation belongs in code space.
What a Good Plan Looks Like
A strong plan lets a fresh reader answer five questions without guessing: what are the main workflows? What constraints matter? What architecture are we choosing? How will we know it works? What failure cases must not disappear into hand-waving?
## Upload workflow- Users drag Markdown files into the upload surface.- The system parses frontmatter plus body text and stores a normalized note record.## Constraints- Unauthorized users must never see note content or note metadata.- Failed ingestions must be preserved for operator review instead of discarded.## Architecture choice- Use a dedicated ingestion pipeline so parse failures can be persisted and retried.- Keep search indexing separate from upload handling so indexing can be retried independently.## Tests and failure handling- Unit coverage for parsing and index mapping.- E2E coverage for upload, failed-ingestion review, retry, search, and filtering.
It gives a fresh agent workflows, constraints, architecture, testing, and failure handling in one place. Before you turn the plan into beads, check that these five questions are answerable from the plan alone.
Escalation Ladder
When something feels wrong, use the smallest escalation that actually fits the problem:
- Local code confusion — stay in code space and resolve it there
- Weak or underspecified bead — step back into bead space and rewrite the bead
- Wrong graph — fix the dependencies or add the missing bead
- Missing plan work — step back into plan space and revise the markdown plan
- Degraded agent — restart it with a fresh session
A Concrete Example: Atlas Notes
A small project makes the workflow easier to picture. Imagine building an internal tool called Atlas Notes: team members upload Markdown notes, the system tags and indexes them, users can search them quickly, and admins can inspect failed ingestions.
If you gave four agents only that vague description, they would step on each other and make mismatched assumptions. The core loop instead looks like this:
- 1You ask multiple frontier models to produce competing markdown plans, then synthesize them into one strong plan.
- 2You tell an agent to convert that plan into beads — upload pipeline, indexing, admin screen, auth, and end-to-end tests — with explicit dependencies.
- 3You launch 2-4 agents with marching orders. They read AGENTS.md, join Agent Mail, and start picking up beads using bv.
- 4You tend the swarm: check progress every 10-15 minutes, rescue confused agents, and add missing beads when needed.
- 5Agents implement, review their own work with fresh eyes, close beads, and move to the next one. You step in for strategic decisions.
What a Good Bead Looks Like
The bead is the unit of work agents actually execute. Weak beads force improvisation. Rich beads make execution mechanical. Here is a real bead from the ACFS project:
bd-01s: Add --deep flag to acfs doctorContext:Part of EPIC: Enhanced Doctor with Functional Tests.What to Do:Add --deep flag to doctor.sh that enables functional tests beyondbinary existence checks:- Add DEEP_MODE=false global- Parse --deep flag alongside existing --json- --deep and --json can be combinedAcceptance Criteria:- --deep flag parsed correctly- Default doctor unchanged (fast, existence checks only)- --deep runs additional functional tests- Works with --json for structured outputFiles to Modify:- scripts/lib/doctor.sh: Argument parsing
The prose does not need polish. A fresh agent should be able to understand the task, the reason for it, and the acceptance criteria without reopening the whole markdown plan. You can browse real beads from actual Flywheel projects at FrankenEngine, FrankenTUI, and Asupersync.
Weak vs. Strong Artifacts
Quality thresholds get easier to feel when you compare weak and strong versions directly. The weak version names a topic. The strong version scopes the actual requirement, constraint, and testing obligation.
Weak vs. Strong Artifacts
A weak artifact names a topic. A strong one carries scope, constraints, and a test plan. See the difference.
The weak version names a topic. The strong version scopes the actual requirement, the constraint, and the testing obligation.
What the Agents Do Automatically
Once you launch agents with good marching orders, they automatically handle the coordination mechanics. A typical bead thread in Agent Mail looks like this — created entirely by agents, not by you:
[br-103] Start: Failed-ingestion admin screenClaiming br-103. Reserving admin UI files plus retry handler path.Will send update once list view is working and retry path is wired.[br-103] Progress: Main path wiredList view and detail view working. Now handling edge cases and tests.[br-103] CompletedAdmin screen done. List view, detail view, and retry action wired.Auth checks in place. E2E coverage for malformed upload → admin review → retry.
You do not write these messages. The agents create them because your AGENTS.md tells them to coordinate through Agent Mail and use bead IDs as thread anchors. Your job is to monitor these threads to see if work is flowing or stuck.
The Operating Rhythm
This section describes what you, the human, actually do. The agents handle the coordination plumbing (Agent Mail messages, file reservations, bead status updates). Your job is to create the conditions for them to succeed.
From plan to production in five steps
Steps 3-5 repeat for every bead. You stop thinking about the process after the second cycle.
Step 1: Create an Excellent Markdown Plan
Before beads or swarms or file reservations, create a serious markdown plan. Do not settle for one quick draft from one model.
GPT Pro web app with Extended Reasoning, or your strongest available model
Different frontier models have different blind spots. Competitive synthesis forces the model to admit where others are better and merge the strongest ideas.
At minimum, you want: the user-facing workflows, the important constraints, the major architectural decisions, and the testing expectations.
Step 2: Tell an Agent to Convert the Plan into Beads
You do not need to manually create every bead yourself. Tell a coding agent to do the conversion:
Claude Code with Opus
Beads become the active source of truth for execution. Once they're strong enough, you never look back at the markdown plan.
Then polish the beads 4-6 times with fresh review passes. Each round catches things the previous round missed. This is the "measure twice, cut once" of the methodology.
Step 3: Launch Agents with Marching Orders
Once beads are polished and your AGENTS.md is solid, start up a swarm of agents. Give each one these marching orders:
Every agent in the swarm gets this as their initial prompt
Every agent is fungible and a generalist. The specifics come from AGENTS.md and the beads, not from the prompt. This generic prompt works for every project.
Stagger agent starts by at least 30 seconds to avoid the "thundering herd" problem where all agents grab the same bead. Start smaller than your ego wants to: 1 agent to learn, 2 to feel coordination, 4 for real swarm behavior.
Step 4: Tend the Swarm
Now you are the operator. On roughly a 10-15 minute cadence, check on the swarm:
- 1Run
bv --robot-triageand check whether the top recommendation still makes sense. - 2Glance through Agent Mail threads — are agents making progress or stuck?
- 3Look for beads stuck in in_progress without movement.
- 4If an agent seems confused after compaction, send: "Reread AGENTS.md so it's still fresh in your mind."
- 5If an agent is truly degraded, kill it and start a fresh one.
That is usually enough to keep the loop healthy without turning the human into a full-time traffic cop.
Step 5: Review, Close, Repeat
After agents finish each bead, have them review their own work:
After each bead is implemented — run until no more bugs found
Forces a mode switch from writing to adversarial reading while the code is still fresh. One of the cheapest quality multipliers in the whole method.
Then they move to the next bead using bv to find the most impactful one. The cycle repeats until the graph is done.
The Human's Job
The human is not supposed to micromanage every code edit or manually coordinate Agent Mail threads. The human is there to keep the structure clean enough that the agents can work effectively inside it.
Who Does What in the Core Loop
You design the system and tend the swarm. Agents do the coordination work.
Create the Plan
You
GPT Pro, Claude, Gemini web apps
Ask multiple frontier models for competing plans, then synthesize into one strong design document
This is where 85% of your thinking goes. No code yet.
What You Do
- Create the plan and beads — this is where most of your time and thinking goes
- Write a good AGENTS.md — this is the operating manual that makes everything else work
- Launch agents with marching orders — the same generic prompt every time
- Keep the bead graph honest — notice when a missing task or dependency must be added
- Restart or redirect agents when they drift, get loopy, or lose context
- Ask the hard question periodically (see below)
What the Agents Do (Not You)
- Register with Agent Mail and discover other active agents
- Claim beads and announce what they are working on
- Reserve files before editing to prevent conflicts
- Update bead status (in_progress, closed) as they work
- Use bv to find the next best bead when they finish one
- Send progress updates and completion messages in Agent Mail threads
All of this is configured once in your AGENTS.md. You do not need to manually invoke Agent Mail calls, update bead statuses, or thread bead IDs into messages. The agents do it because the operating manual tells them to.
The Reality Check
When the swarm looks active but you suspect it is not actually closing the real gap, stop and ask:
If the answer is "no," the fix is usually not more implementation effort. Revise the bead graph, add missing work, or step back into planning.
Minimum Viable AGENTS.md
Even in the smaller core-loop version, you still need a minimal AGENTS.md. It does not have to be a giant doctrine document, but it should say:
- What the repo is for
- What the stack is
- Any non-negotiable safety or style rules
- How to use Agent Mail, br, and bv (include the prepared blurbs from each tool's docs)
Common Failure Modes
Agent Disappeared Mid-Bead
When an agent vanishes mid-bead, the recovery path should be boring:
- 1Check the Agent Mail thread for the last meaningful progress update.
- 2Launch a fresh agent with the standard marching orders.
- 3The new agent will read AGENTS.md, discover the abandoned bead via bv, and pick it up.
- 4If the bead was partially completed, the new agent can continue from the code state plus the thread history.
What It Feels Like Once It Clicks
At some point, the workflow stops feeling like extra ceremony and starts feeling like a calmer control surface:
- Less duplicated work, because agents manage ownership and reservations automatically
- Less "what should I do next?" drift, because bv keeps answering that question for the agents
- Easier restart after context loss, because the work lives in beads and threads instead of only in chat history
- Easier handoff, because any agent can read the bead, read the thread, and continue
That operator feeling is a good sign. It usually means the artifacts are carrying the work instead of your short-term memory.
Why This Captures Most of the Value
People often assume the magic of the Flywheel comes from the total number of tools. It does not. Most of the value comes from three things:
- 1Work is explicit instead of implicit
- 2Coordination is externalized instead of living in human memory
- 3Task choice is graph-aware instead of random
Those three properties are already present in the core loop. That is why the smaller system gets you surprisingly far.
When Not to Use the Core Loop
You probably do not need it for a tiny one-file change with no real dependency structure, a purely local experiment, or a quick one-agent cleanup that does not need externalized coordination.
The loop earns its keep when work has enough structure, enough ambiguity, or enough parallelism that explicit planning, explicit tasks, and explicit coordination start paying for themselves.
Helper Utilities: DCG, UBS & CASS
Once the core loop is running smoothly, three helper utilities significantly improve safety, quality, and learning. They are multipliers on top of the core loop, not prerequisites.
DCG
A Claude Code hook that blocks dangerous git and filesystem commands before execution. Sub-millisecond latency, mechanical enforcement.
Works automatically. When a dangerous command is blocked, use safer alternatives or ask the user to run it manually.
dcg test 'rm -rf /' --explainUBS
Multi-language bug scanner with guardrails. Run it on changed files before every commit to catch injection, unquoted variables, and other hazards.
ubs <changed-files> before every commit. Exit 0 = safe. Exit >0 = fix and re-run.
ubs $(git diff --name-only --cached)CASS
Indexes prior agent conversations so solved problems can be reused. Finds patterns, decisions, and solutions across session history.
Search before reinventing a solution. If an agent solved a similar problem before, CASS will find it.
cass search 'auth middleware' --robot --limit 5What You Can Ignore for Now
If you are just getting started, you do not need to master all of this immediately:
- Large-scale session memory systems like CASS and CM
- Big prompt libraries
- Advanced launch tooling like ntm
- The full exhaustive planning doctrine
- Every supporting tool in ACFS
Those things help. Some help a lot. But they are multipliers on top of the core loop, not prerequisites. You can run the core loop with separate terminal tabs and no special session manager.
What You Should Not Ignore
Even in the smaller version, a few principles still matter a lot:
- Do not start a swarm with only vague goals — make a real plan first
- Do not treat beads as tiny throwaway todo lines — they need rich context
- Do not skip the bead polishing rounds — single-pass beads are never optimal
- Do not rely on chat scrollback as your coordination system — that is what Agent Mail is for
If you violate those, the workflow quickly degrades back into ordinary multi-agent chaos.
Getting Started
The First 30 Minutes
- 1Pick one real project, not a toy.
- 2Ask multiple frontier models for competing markdown plans.
- 3Synthesize them into one strong plan.
- 4Tell an agent to create beads from the plan with dependencies.
- 5Polish the beads 4-6 times with fresh review passes.
- 6Run
bv --robot-triageto verify the graph makes sense. - 7Launch 2-4 agents with the standard marching orders.
- 8Tend the swarm. Check every 10-15 minutes.
Start smaller than your ego wants to:
Try This Now
If you want to feel the method instead of only reading about it:
- 1Pick one real repo
- 2Write one serious markdown plan
- 3Tell an agent to create two real beads with one dependency
- 4Run
bv --robot-nextand check that the recommendation makes sense - 5Launch a second agent with the marching orders and watch them coordinate
Those five steps are enough to make the core loop stop feeling theoretical.
The Cheat Card
If you want the loop on one screen, keep this:
- 1Plan with multiple models
- 2Synthesize into one markdown plan
- 3Tell an agent to create beads
- 4Polish beads 4-6 times
- 5Write a good AGENTS.md
- 6Launch agents with marching orders
- 7Tend the swarm every 10-15 minutes
- 8Have agents do fresh-eyes review after each bead
- 9Repeat until the graph is done
When to Graduate to the Full Flywheel
Move up to the full guide when one or more of these becomes true:
- Your projects are large enough that you want much richer planning workflows
- You want stronger AGENTS.md operating manuals with comprehensive tool documentation
- You want repeatable prompt libraries and skills
- You want better recovery from compaction and session loss
- You want memory systems (CASS, CM) that improve the workflow over time
At that point, the bigger document stops feeling like overhead and starts feeling like leverage.
Graduate to the Full Flywheel
Once the core loop feels natural, the full methodology adds richer planning workflows, memory systems, prompt libraries, and the complete Dicklesworthstone stack.