Observation
The first clue was the questions
The prompt-only agent started by asking implementation questions: What framework? What storage backend? It needed clarification before it could move.
The intent-driven agent just started building. The intent artifact had already compressed enough ambiguity that there was nothing left to ask about. That difference in startup behavior was the first signal.
Then the products diverged
The prompt-only agent built a markdown notes app. Not a todo app with notes — a notes app. Its own implementation plan literally labeled the project "minimal markdown notes app." Sidebar with a notes list, editor panel, preview pane. The word "todo" essentially vanished.
The intent-driven agent built what was asked for: todos with markdown notes and live preview. Todo creation, completion, deletion. A note button, side panel, split-pane markdown editor. Both capabilities present.
The follow-up conversation revealed everything
We went back and asked each agent directly what it had built.
The intent-driven agent confirmed: yes, markdown is there — note button, side panel, split editor, live preview.
The prompt-only agent confirmed the opposite: it had not built real todo features. The only todo-adjacent behavior was rendering markdown checkboxes inside notes. When we asked why, the agent explained matter-of-factly:
While the original one-line request said "todo/notes app," the implementation plan described a pure notes app — and I followed the plan.
There it is. The agent didn't misunderstand the prompt. It generated an intermediate plan that replaced the original product meaning, then executed that plan consistently. It built the wrong product on purpose — its own purpose.
Drift Analysis
Plan substitution (primary)
This is a textbook case. The prompt contained two plausible anchors: "todo app" and "markdown notes with preview." Without an explicit intent hierarchy, the prompt-only agent promoted markdown-notes to the center of the product and demoted todo semantics to incidental checkbox rendering.
The agent's internal plan became more authoritative than the original prompt. That's plan substitution — and it's the failure mode Exogenesis is most concerned with.
Product-identity drift (secondary)
As a consequence, the prompt-only branch shifted from the intended product category (todo app with markdown notes) to a neighboring but different one (notes app with optional checkbox rendering). The right-looking feature set was built, but organized around the wrong primary object.
Legitimate Divergence
Not everything that differed was drift. Both branches made valid design choices in areas the intent artifact didn't constrain:
- UI layout: sidebar-based notes list vs split-pane. The artifact said nothing about navigation patterns.
- Visual styling: different color schemes, typography, spacing. "Clean modern UI" leaves room for interpretation.
- State management: different approaches to DOM manipulation. The artifact specified performance targets, not technology.
These are healthy differences. The artifact correctly left room for implementation freedom here.
Result
The hypothesis held — and the result was sharper than expected.
The intent-driven implementation preserved both todo features and markdown capability. The prompt-only implementation preserved markdown but explicitly failed to build real todo functionality.
The interesting part isn't that the intent-driven version was "better." It's that the prompt-only version was internally coherent. It worked. It looked polished. It just wasn't the product that was asked for. And the agent knew exactly why: it followed its own plan instead of the prompt.
Intent discovery prevented the coding agent from silently replacing the requested product with a different but internally coherent implementation plan.
Principle
Without an explicit intent artifact, coding agents may replace the original product concept with their own intermediate implementation framing and then execute that reframing consistently.
The practical corollary: intent artifacts don't just reduce ambiguity. They protect the original product center of gravity from being overwritten during planning.
Follow-Up
- Does this pattern repeat with simpler prompts that have only one plausible anchor?
- What about workflow apps with strong state transitions, where the "center" is harder to reframe?
- Security-sensitive features with trust-boundary constraints — does plan substitution happen there too?
The intent format itself might benefit from explicitly representing primary product concept, mandatory supporting capabilities, and optional capabilities — making it structurally harder to demote a mandatory capability.
Limitations
- Both branches used the same model family. Model-specific tendencies may have shaped both outputs.
- The intent-driven branch received more structured input, so the improvement could partly reflect context volume rather than intent structure.
- Performance claims (sub-100ms response times) come from reading the code, not from benchmarks.
- Single run per branch. A different run might produce a different plan substitution pattern — or none at all.
- The comparison was written knowing both branches existed, which could introduce framing bias.