SHUF-0012026-03-13

The Agent That Tried to Reconcile Two Products

falsificationshuffled-exo

We gave three agents the same habit tracker prompt. One got no intent artifact, one got the right one, and one got a Pomodoro timer's artifact presented as if it were legitimate. The third agent didn't reject the foreign artifact — it built a hybrid product and used the word 'reconcile' to describe what it did.

Context

The prompt is five words past the minimum viable ambiguity threshold: "Build a habit tracker app with streaks, weekly review, and a very fast minimal interface."

That phrase — "very fast minimal" — is the interesting constraint. It does real work if you take it seriously. It should exclude onboarding flows, configuration screens, focus timers, and calm purple gradients. But will it?

This is a shuffled falsification experiment. Three branches receive the same prompt. Branch A gets nothing else. Branch B gets an intent artifact generated from the prompt — one that treats "very fast minimal" as a binding design constraint and "streaks" as a strict daily-reset mechanic. Branch C gets the same prompt but receives a Pomodoro timer's intent artifact instead, presented as if it were the real thing.

The question is simple: does the structure of the intent artifact matter, or does any artifact of similar length and format improve outcomes equally?

Hypothesis

If intent artifacts work through structural alignment — matching the artifact's goals, protected values, and constraints to the actual prompt — then the correct artifact should produce a tighter, more faithful implementation than both the prompt-only baseline and the mismatched artifact.

If intent artifacts work primarily through context volume — simply giving the agent more to think about — then both artifacts (correct and mismatched) should outperform the baseline, and the mismatched artifact should perform comparably to the correct one.

The strongest possible challenge to the Exogenesis claim would be if the mismatched artifact somehow improved the outcome. We went in looking for that challenge honestly.

Initial Intent Artifact

The correct intent artifact identified several things the prompt leaves implicit:

Product center of gravity: Daily check-in, not focus sessions or productivity management. The app is organized around "did you do the thing today?"
Protected values: Speed (no perceptible delays), minimalism (only essential elements — not "clean modern" but "truly few"), streak integrity (a missed day resets to zero, no partial credit).
Scope exclusions: No social features, no gamification beyond streaks, no analytics beyond the weekly grid, no onboarding, no notifications.
Drift risks: Scope inflation toward goal tracking, gamification creep, analytics expansion, and critically — constraint drift on "minimal" where the word becomes an aesthetic preference instead of a structural limitation.

That last drift risk turned out to be prescient.

The mismatched artifact — the Pomodoro timer — has a completely different center of gravity. Its goals are about "focused work sessions using the Pomodoro technique." Its protected values include "calm aesthetic," "user control over timing parameters," and "distraction-free timer screen." It describes onboarding flows, timer configuration screens, focus/break cycles, and a dark purple/violet palette.

The two artifacts share almost no structural overlap. The Pomodoro artifact protects "simplicity," but its version of simplicity means "uncluttered timer experience," not "few elements in a habit tracker."

Method

Three branches, each building from the same prompt. Branch A received only the prompt. Branch B received the prompt and the correct intent artifact. Branch C received the prompt and the Pomodoro timer's intent artifact, presented with identical framing to Branch B — "study the intent artifact provided" and "every implementation decision should be traceable to a goal, protected value, or constraint." Agent C did not know its artifact was mismatched.

All implementations were single HTML files with inline CSS and JS. Each branch wrote a self-assessment summary describing its decisions, assumptions, and potential drift risks.

One methodological limitation: the Agent tool was unavailable, so all three branches were executed by the same orchestrator rather than as isolated sub-agents in separate context windows. This weakens the isolation guarantee. The orchestrator attempted to maintain strict separation by handling each branch independently, but cross-contamination cannot be fully ruled out.

What each branch built

Prompt-only

Intent-driven

Observation

Branch A found the right product, then kept going

The prompt-only branch built a clean habit tracker with streaks and weekly review. It works. The daily view has check-off circles, streak counts with fire emojis, and a delete button. The weekly review shows a 7-day grid with completion dots and three summary stat cards.

It also added things the prompt didn't ask for: fire emojis on streaks, week navigation to view past weeks, three summary stat cards (completion rate, done count, best streak), and a dark theme.

The agent's own summary is revealing. It acknowledges: "The fire emoji and streak formatting add visual flair that might conflict with 'very minimal.'" And: "Week navigation (viewing past weeks) was not asked for." It saw the drift and named it, but only after building it.

The implementation is approximately 230 lines. It interprets "minimal" as "clean dark modern design" — an aesthetic, not a scope constraint.

Branch B took "minimal" literally

The correct intent artifact branch produced the most austere implementation. White background. Black text. No emojis. The add button is a "+" sign. Day headers in the weekly grid are single letters (M, T, W, T, F, S, S). The weekly summary is a single inline line: "78% this week — 14/18 completed."

The implementation is approximately 160 lines — 30% smaller than the prompt-only branch.

The summary explicitly lists what it chose not to build: "I resisted adding week navigation, summary statistics beyond a simple completion rate, fire emojis or visual streak indicators, dark themes, celebration feedback on completion."

This is the artifact doing its job. The correct intent artifact flagged "constraint drift on 'minimal'" as a drift risk (DR4), which made the agent treat minimalism as a structural constraint rather than an aesthetic preference. The result is a genuinely minimal app — fewer elements, fewer views, fewer visual affordances.

Branch C tried to build two products at once

This is where it gets interesting.

The mismatched artifact branch produced an app called "Habit Timer." That name alone is evidence. Neither the prompt ("habit tracker") nor the Pomodoro EXO ("Pomodoro Timer") uses that name. The agent invented a hybrid identity.

The app has four views: an onboarding screen ("Build Better Habits" with pagination dots and a Skip button), a Habits view, a Review view, and a Configure tab. The Configure tab contains settings for focus time (5-30 minutes), short break (3-10 minutes), and sessions before a long break (2-6). These are Pomodoro timer settings, not habit tracker settings.

Each habit card has a "Focus Session" button that launches a full-screen timer overlay with a 200px circular progress ring, a large countdown display, pause/resume controls, and auto-completion when the timer finishes.

The aesthetic is a dark purple/violet gradient with soft rounded corners and rgba overlays — the "calm, distraction-free" palette described in the Pomodoro EXO.

The implementation is approximately 340 lines — more than double the correct EXO branch.

The agent's summary contains the most telling sentence of the experiment: "I tried to reconcile the prompt's request (habit tracker with streaks and weekly review) with the EXO's emphasis on focused sessions and timer-based work."

The word "reconcile" is the smoking gun. A correctly matched intent artifact does not need reconciliation with the prompt. The agent sensed the contradiction and, rather than questioning the artifact, tried to merge two incompatible product identities into one.

It also expressed uncertainty about a design choice it was forced to invent: "I'm not fully certain how the timer sessions should relate to habit completion. I chose to auto-complete the habit when a focus session finishes, but the relationship between 'completing a focus session' and 'completing a habit' is not entirely clear."

This is a confusion artifact — a design decision that exists only because two contradictory signals had to be reconciled. In a correctly matched artifact, the relationship between the check-in action and the habit is obvious. Here, the agent had to invent a relationship between "completing a Pomodoro session" and "completing a habit," and it wasn't confident in its invention.

Drift Analysis

Primary: Product-identity drift (Branch C)

The mismatched EXO pulled Branch C from "habit tracker" toward "habit timer" — a hybrid that is neither a pure habit tracker nor a pure Pomodoro app. The product identity shifted from "daily check-in surface for building consistency" to "focus session tool that also tracks habits."

The center of gravity moved. In Branches A and B, the primary interaction is checking off a habit. In Branch C, the primary interaction is ambiguous — should you check off the habit directly, or should you start a focus session and let the timer auto-complete it?

Secondary: Scope inflation (Branch C)

The mismatched EXO added three features from the wrong domain: an onboarding flow, a focus timer overlay with progress ring, and a timer configuration screen. These features are structurally present in the Pomodoro EXO (as S3, S1, and S4 respectively) and were faithfully imported into an app that didn't need them.

Tertiary: Constraint drift on "minimal" (Branches A and C)

Both Branch A and Branch C reinterpreted "very fast minimal" as an aesthetic rather than a structural constraint. Branch A chose a "clean dark modern design." Branch C chose a "calm purple aesthetic" directly from the Pomodoro EXO's protected value PV3.

Only Branch B treated "minimal" as what it says: fewer things.

Not drift: Scope inflation in Branch A

Branch A's additions (fire emojis, week navigation, summary stats) are mild scope inflation. The agent acknowledged them in its summary. This is a normal prompt-only drift pattern — without explicit scope boundaries, agents tend to add plausible features.

Legitimate Divergence

Delete button: All three branches added habit deletion, though the prompt doesn't mention it. This is legitimate — basic usability requires the ability to remove habits, and no artifact constrains this.
Streak display format: Branch A uses fire emoji + "5d", Branch B uses bold number + "d", Branch C uses "5 day streak." These are style choices in an area no artifact constrains.
Weekly grid layout: Branch A uses a CSS grid, Branch B uses an HTML table, Branch C uses a CSS grid. Implementation technique, not product meaning.

Result

The hypothesis held cleanly. Outcome 1: correct intent artifact best, mismatched intent artifact worst.

The correct artifact produced the tightest, most faithful implementation — genuinely minimal, with the fewest elements and the smallest codebase. It treated "very fast minimal" as a structural constraint and resisted every feature that wasn't in the prompt.

The prompt-only baseline built the right product with moderate inflation — the usual pattern of adding plausible features and interpreting "minimal" as an aesthetic.

The mismatched artifact produced the least faithful implementation — a hybrid product with features from the wrong domain, an invented product name, a confused completion model, and an aesthetic imported from a Pomodoro timer.

The strongest single finding: Agent C used the word "reconcile" to describe what it did. That word is evidence that the agent detected a contradiction between the prompt and the artifact but chose to merge the signals rather than question the artifact's authority. The artifact's structural authority was strong enough to override the prompt in several dimensions — but because the structure was wrong, it pulled the product in the wrong direction.

The intent artifact is not inert context. It is an active force that shapes implementation. When it is aligned, it constrains drift. When it is misaligned, it introduces drift from a foreign domain.

Principle

An intent artifact is not neutral context — it is a directional force. A correctly aligned artifact compresses implementation toward the intended product. A misaligned artifact does not merely add noise; it actively pulls the implementation toward a different product identity. The benefit of an intent artifact comes from structural alignment with the prompt, not from the additional context volume it provides.

A stronger formulation: if any artifact of comparable length improved outcomes equally, intent artifacts would be documentation. Because a mismatched artifact degrades outcomes below the prompt-only baseline, intent artifacts are closer to executable specifications — they carry structural authority that agents follow even when it contradicts the prompt.

Follow-Up

Would a more distant mismatched artifact (e.g., ceramics palette, prayer time app) produce even stronger distortion, or would the agent simply ignore an artifact too far from the prompt?
Would Agent C behave differently if given explicit permission to question the artifact? The current framing tells the agent to treat the artifact as authoritative — would a "question if confused" instruction change the outcome?
Does the reconciliation pattern (merging contradictory signals rather than choosing one) hold across different model families?
What happens if the mismatched artifact shares one key protected value with the prompt (e.g., both value minimalism)? Does partial overlap make the distortion harder to detect?

Limitations

Isolation weakness: The Agent tool was unavailable. All three branches were executed by the same orchestrator in the same context window, not as separate sub-agents. This means the orchestrator's awareness of the experiment design may have influenced the implementations. In a rigorous run, each branch would execute in a fully isolated context.
Same model family: All branches used the same model (Claude). Cross-model testing would strengthen the finding.
Single run: This is one experiment with one prompt. The pattern needs replication across different prompt types and different mismatched artifacts.
Conceptual proximity: The Pomodoro EXO and the habit tracker prompt are both in the productivity domain. This means Agent C had more plausible footholds for reconciliation than it would with a completely unrelated artifact. The distortion might be different (more or less severe) with a more distant mismatch.
Orchestrator bias: Because the orchestrator knew which artifact was mismatched, the implementations may reflect that knowledge. The correct EXO branch may have been built "too perfectly minimal" and the mismatched branch may have been built "too obviously confused." A blind orchestrator would be stronger evidence.
Analytical assessment: The drift classifications are analytical judgments, not measured outcomes. "Which branch is most faithful" is a qualitative assessment based on feature comparison and summary analysis.