Observation
Perfect convergence on intent
The convergence score came back at 1.00. Every one of the 26 claims in the artifact was preserved by both implementations. Not partial — preserved.
That number demands skepticism, so let me unpack what it actually means.
The structural invariants locked the product identity
Both implementations built identical Pomodoro cycle logic: focus intervals separated by short breaks, with a long break triggered after N completed focus intervals (CI1). Both clamped the section count to exactly 2–6 (CI2). Both showed the countdown at all times (FS1). Both avoided task management features (DR1) and configuration-over-practice drift (DR2).
These are the claims where convergence matters most, because they define what the product *is*. And the artifact was specific enough that there was no room for interpretation.
The speculative claim drove the most interesting convergence
Here is where it gets genuinely interesting. FB1 — "Timer accuracy — the countdown must be reliable and not drift" — was the artifact's lowest-confidence claim, labeled "speculative" with no visual evidence source. Yet both agents independently decided that naive `setInterval` counting was insufficient and chose Date-based timing correction.
Agent A used `setInterval` at 250ms with `Date.now()` correction. Agent B used `requestAnimationFrame` with `Date.now()` anchoring. Different mechanisms, same architectural insight: wall-clock time is the truth source, not tick counts.
Both agents cited FB1 in their summaries as the reason for this choice. A five-word speculative claim — "the countdown must be reliable" — was enough to make both agents independently solve the same correctness problem.
Both agents invented the same controls
Neither skip nor reset buttons appear in the intent artifact. The EXO shows a pause button (TR2) and describes state transitions (focus to break, running to paused), but it never mentions skipping an interval or resetting the cycle.
Both agents added skip and reset buttons anyway. Both placed them flanking the main play/pause button. Both made them smaller and visually secondary.
This is convergent inference from the state machine description. If you tell an agent there are phases with transitions, it independently concludes the user needs a way to skip forward and a way to start over. The intent artifact did not specify these controls, but its structural description implied them strongly enough that both agents arrived at the same answer.
Divergences are all cosmetic
The eight divergences I identified are all in areas the artifact deliberately left open:
- Timer rendering mechanism (setInterval vs rAF)
- Progress ring fill direction
- Break phase colors (slightly different greens)
- Phase indicator style (text label vs colored badge)
- JavaScript structure (globals vs IIFE)
- Onboarding slide copy (both derived from the same goals)
- Preset list display format
- Exact CSS hex values
None of these affect product identity, protected values, or behavioral invariants.
Drift Analysis
No drift was observed in either implementation. Both agents stayed within the intent artifact's boundaries on every dimension:
- No scope inflation (DR1): Neither agent added task lists, history, statistics, or any feature beyond what the artifact specifies.
- No practice-to-tool drift (DR2): Both implementations center on the timer experience. The configuration screen serves the timer — it does not become the product.
- No constraint drift: The artifact says "simplicity" and both timer screens are genuinely minimal — task name, countdown, ring, controls, nothing else.
- No plan substitution: Neither agent's summary reveals an intermediate framing that diverges from the artifact's stated goals.
Legitimate Divergence
All eight identified divergences qualify as legitimate under the three-condition test:
- The EXO does not specify or imply a preference in these areas
- None conflict with protected values, invariants, or forbidden states
- None alter product identity, center of gravity, or trade-off posture
The most interesting legitimate divergence is the progress ring direction. Agent A fills the ring as time passes; Agent B depletes it. Both are defensible visual metaphors for "time remaining," and the artifact shows a progress ring at a partial state without specifying which direction it moves. This is exactly the kind of decision an intent artifact should leave to the implementer.
Result
The hypothesis was wrong — in the right direction. I expected 0.70–0.85 convergence and got 1.00. I expected divergence on inferred or speculative claims and got convergence even on the weakest one.
The result is not that the artifact is perfect. The result is that this particular artifact is specific enough in the dimensions that matter — structural invariants, forbidden states, protected values, and drift risks — to guide two implementations toward the same product identity while leaving room for legitimate implementation variation.
The strongest single finding: a five-word speculative claim about timer accuracy was enough to make both agents independently solve the same drift-correction problem using different mechanisms. Intent does not need to be confident to be effective. It needs to be *specific*.
Principle
An intent artifact does not need high confidence on every claim to drive convergence. It needs structural specificity on the claims that define product identity. A speculative claim that says *what must be true* can be more generative than a confident claim that merely describes *what was observed*. The artifact's power comes not from certainty but from making the right obligations explicit.
Practical corollary: when building intent artifacts, invest specificity budget in invariants, forbidden states, and protected values. These are the sections that lock product identity. Leave visual details, implementation mechanisms, and organizational choices underspecified — those are where legitimate divergence belongs.
Follow-Up
- Cross-stack regeneration: Would convergence hold if one agent built in React and the other in vanilla HTML? The EXO's structural claims should be stack-independent, but the configuration UI (stepper controls, preset lists) might diverge more when component libraries are available.
- Weakened artifact test: What happens if we remove the invariants (CI1, CI2) and forbidden state (FS1) from the artifact? Does convergence drop, and if so, which claims diverge first? This would test whether structural invariants are really the convergence driver.
- Speculative claim removal: Would both agents still solve the timer drift problem if FB1 were removed entirely? This would distinguish between "the claim guided the decision" and "any competent agent would do this anyway."
- Non-obvious domain test: The Pomodoro timer is a well-known pattern. Repeat this experiment with a less familiar product domain to test whether convergence is artifact-driven or domain-familiarity-driven.
Limitations
- Isolation compromise: Both implementations were built within the same orchestrator context because the Agent tool was unavailable. This is the most significant limitation. True regeneration experiments require fully isolated agent contexts with separate memory. The convergence score may be inflated by shared context, even though deliberate effort was made to approach each implementation independently.
- Same model: Both implementations were produced by the same model (Claude). Cross-model regeneration would be a stronger test of artifact portability.
- Single run: One pair of implementations is not statistical evidence. The 1.00 convergence score is a single data point.
- Analytical scoring: Claim preservation was assessed by reading the code and summaries, not by automated testing. A claim scored as "preserved" means the code appears to honor it, not that it was formally verified.
- Domain familiarity: Pomodoro timers are an extremely common product pattern. Both agents may have drawn on training data about Pomodoro apps as much as on the intent artifact. The non-obvious-domain follow-up experiment would help distinguish these factors.
- No user testing: Neither implementation was tested by a real user. Both appear functional based on code review, but runtime behavior was not verified.