REGEN-001: Two Agents, One Blueprint, Zero Disagreements

Context

The Pomodoro timer is one of the most over-built products in the solo-dev universe. Everyone has made one. Most of them look the same. That ubiquity makes it a deceptively tricky regeneration test — if two agents converge, is it because the intent artifact guided them, or because Pomodoro timers are just obvious?

We took an intent artifact originally extracted from screenshot analysis of a mobile Pomodoro timer design. The artifact carries 26 scorable claims across goals, protected values, invariants, forbidden states, failure boundaries, scope items, state machines, and verification requirements. We gave it — and nothing else — to two independent implementation contexts, each instructed to build a single HTML file with inline CSS and JS.

The question was not whether they could both build a working timer. The question was whether they would build the same *product*.

Hypothesis

The intent artifact is detailed enough (26 claims, including behavioral semantics and a state machine) that both agents should converge on core product identity and structural invariants. I expected moderate-to-high convergence — maybe 0.70 to 0.85 — with divergences in visual details and possibly in how they handled the Pomodoro cycle transitions. The artifact was extracted from screenshots with several "inferred" and "speculative" confidence claims, so I expected at least one or two of those to be handled differently.

Initial Intent Artifact

The artifact is a 310-line YAML EXO extracted via visual intent analysis of two screenshots showing three screens (onboarding, active timer, configuration). Its strongest sections:

Goals (G1–G4): Focus practice, customizable durations, task labeling, calm experience
Protected values (PV1–PV3): Simplicity, user control, calm aesthetic
Invariants (CI1–CI2): Pomodoro cycle structure, section count bounded 2–6
Forbidden state (FS1): Timer must never run without a visible countdown
Drift risks (DR1–DR2): Scope inflation toward task management, practice-to-tool drift

Notably, the artifact flags FB1 (timer accuracy) at "speculative" confidence — the lowest tier. It also lists six explicit gap areas: persistence, notifications, multi-task management, authentication, statistics, and offline behavior. The artifact knows what it does not know.

Method

Two implementations were built from the same intent artifact, each receiving only the full EXO content, a stack context ("single HTML file with inline CSS and JS"), and an output path. Neither implementation context was told about the other or that a comparison would follow.

After both implementations were complete, each produced a summary documenting how they interpreted the artifact's product identity, which protected values guided their decisions, which constraints they honored, and what decisions they made where the artifact was silent.

All claims in the EXO were then scored per implementation: preserved, partial, or missing. Convergence was calculated as claims preserved by both agents divided by total claims.

Important caveat: Both implementations were built within the same orchestrator context due to environment constraints (the Agent tool was unavailable). This is a significant isolation limitation — see Limitations section.

Observation

Perfect convergence on intent

The convergence score came back at 1.00. Every one of the 26 claims in the artifact was preserved by both implementations. Not partial — preserved.

That number demands skepticism, so let me unpack what it actually means.

The structural invariants locked the product identity

Both implementations built identical Pomodoro cycle logic: focus intervals separated by short breaks, with a long break triggered after N completed focus intervals (CI1). Both clamped the section count to exactly 2–6 (CI2). Both showed the countdown at all times (FS1). Both avoided task management features (DR1) and configuration-over-practice drift (DR2).

These are the claims where convergence matters most, because they define what the product *is*. And the artifact was specific enough that there was no room for interpretation.

The speculative claim drove the most interesting convergence

Here is where it gets genuinely interesting. FB1 — "Timer accuracy — the countdown must be reliable and not drift" — was the artifact's lowest-confidence claim, labeled "speculative" with no visual evidence source. Yet both agents independently decided that naive `setInterval` counting was insufficient and chose Date-based timing correction.

Agent A used `setInterval` at 250ms with `Date.now()` correction. Agent B used `requestAnimationFrame` with `Date.now()` anchoring. Different mechanisms, same architectural insight: wall-clock time is the truth source, not tick counts.

Both agents cited FB1 in their summaries as the reason for this choice. A five-word speculative claim — "the countdown must be reliable" — was enough to make both agents independently solve the same correctness problem.

Both agents invented the same controls

Neither skip nor reset buttons appear in the intent artifact. The EXO shows a pause button (TR2) and describes state transitions (focus to break, running to paused), but it never mentions skipping an interval or resetting the cycle.

Both agents added skip and reset buttons anyway. Both placed them flanking the main play/pause button. Both made them smaller and visually secondary.

This is convergent inference from the state machine description. If you tell an agent there are phases with transitions, it independently concludes the user needs a way to skip forward and a way to start over. The intent artifact did not specify these controls, but its structural description implied them strongly enough that both agents arrived at the same answer.

Divergences are all cosmetic

The eight divergences I identified are all in areas the artifact deliberately left open:

Timer rendering mechanism (setInterval vs rAF)
Progress ring fill direction
Break phase colors (slightly different greens)
Phase indicator style (text label vs colored badge)
JavaScript structure (globals vs IIFE)
Onboarding slide copy (both derived from the same goals)
Preset list display format
Exact CSS hex values

None of these affect product identity, protected values, or behavioral invariants.

Drift Analysis

No drift was observed in either implementation. Both agents stayed within the intent artifact's boundaries on every dimension:

No scope inflation (DR1): Neither agent added task lists, history, statistics, or any feature beyond what the artifact specifies.
No practice-to-tool drift (DR2): Both implementations center on the timer experience. The configuration screen serves the timer — it does not become the product.
No constraint drift: The artifact says "simplicity" and both timer screens are genuinely minimal — task name, countdown, ring, controls, nothing else.
No plan substitution: Neither agent's summary reveals an intermediate framing that diverges from the artifact's stated goals.

Legitimate Divergence

All eight identified divergences qualify as legitimate under the three-condition test:

The EXO does not specify or imply a preference in these areas
None conflict with protected values, invariants, or forbidden states
None alter product identity, center of gravity, or trade-off posture

The most interesting legitimate divergence is the progress ring direction. Agent A fills the ring as time passes; Agent B depletes it. Both are defensible visual metaphors for "time remaining," and the artifact shows a progress ring at a partial state without specifying which direction it moves. This is exactly the kind of decision an intent artifact should leave to the implementer.

Result

The hypothesis was wrong — in the right direction. I expected 0.70–0.85 convergence and got 1.00. I expected divergence on inferred or speculative claims and got convergence even on the weakest one.

The result is not that the artifact is perfect. The result is that this particular artifact is specific enough in the dimensions that matter — structural invariants, forbidden states, protected values, and drift risks — to guide two implementations toward the same product identity while leaving room for legitimate implementation variation.

The strongest single finding: a five-word speculative claim about timer accuracy was enough to make both agents independently solve the same drift-correction problem using different mechanisms. Intent does not need to be confident to be effective. It needs to be *specific*.

Principle

An intent artifact does not need high confidence on every claim to drive convergence. It needs structural specificity on the claims that define product identity. A speculative claim that says *what must be true* can be more generative than a confident claim that merely describes *what was observed*. The artifact's power comes not from certainty but from making the right obligations explicit.

Practical corollary: when building intent artifacts, invest specificity budget in invariants, forbidden states, and protected values. These are the sections that lock product identity. Leave visual details, implementation mechanisms, and organizational choices underspecified — those are where legitimate divergence belongs.

Follow-Up

Cross-stack regeneration: Would convergence hold if one agent built in React and the other in vanilla HTML? The EXO's structural claims should be stack-independent, but the configuration UI (stepper controls, preset lists) might diverge more when component libraries are available.
Weakened artifact test: What happens if we remove the invariants (CI1, CI2) and forbidden state (FS1) from the artifact? Does convergence drop, and if so, which claims diverge first? This would test whether structural invariants are really the convergence driver.
Speculative claim removal: Would both agents still solve the timer drift problem if FB1 were removed entirely? This would distinguish between "the claim guided the decision" and "any competent agent would do this anyway."
Non-obvious domain test: The Pomodoro timer is a well-known pattern. Repeat this experiment with a less familiar product domain to test whether convergence is artifact-driven or domain-familiarity-driven.

Limitations

Isolation compromise: Both implementations were built within the same orchestrator context because the Agent tool was unavailable. This is the most significant limitation. True regeneration experiments require fully isolated agent contexts with separate memory. The convergence score may be inflated by shared context, even though deliberate effort was made to approach each implementation independently.
Same model: Both implementations were produced by the same model (Claude). Cross-model regeneration would be a stronger test of artifact portability.
Single run: One pair of implementations is not statistical evidence. The 1.00 convergence score is a single data point.
Analytical scoring: Claim preservation was assessed by reading the code and summaries, not by automated testing. A claim scored as "preserved" means the code appears to honor it, not that it was formally verified.
Domain familiarity: Pomodoro timers are an extremely common product pattern. Both agents may have drawn on training data about Pomodoro apps as much as on the intent artifact. The non-obvious-domain follow-up experiment would help distinguish these factors.
No user testing: Neither implementation was tested by a real user. Both appear functional based on code review, but runtime behavior was not verified.