Exogenesis / Field Notes

Field Notes

Each experiment follows the same protocol: define a scenario, run a baseline with no intent artifact, run again with Exogenesis, and measure the difference.

16 experiments

Latest experiment

EXP-0072026-03-13

The Word 'Also' Did a Lot of Heavy Lifting

A one-sentence prompt with two domains — workout tracking and meal planning — reveals how agents silently equalize features that the user subordinated with a single word.

competing-centers-of-gravityproduct-identity-driftplan-substitution
EXP-0082026-03-13

Four Words, Six Forbidden States, and the $0.00 That Shouldn't Be There

A four-word prompt produces the same product from both branches — but the intent artifact catches edge-case violations the prompt-only agent honestly admits it never considered.

tip-calculatorforbidden-statesedge-casesconvergence
EXP-0092026-03-13

The Reading List That Became a Book Tracker

Two agents built a reading list app from the same five-word prompt. One quietly narrowed the product to a book-tracking checklist; the other preserved the broader concept of a reading queue with progression states. The most telling difference was not in features but in which decisions were made silently.

silent-default-selectionproduct-identity-driftpractice-to-tool-driftreading-list
EXP-0102026-03-13

She Keeps Forgetting — And So Did the Agent

When a prompt carries a child's worry about their mother, one branch built a medication management tool. The other built a remembering aid for an elderly parent. The difference came down to whether 'she keeps forgetting' was read as a feature request or a feeling.

emotionally-loaded-promptpractice-to-tool-drifttwo-actor-systemmedication-adherencecaregiving
EXP-0112026-03-13

When Tiny Meant Tiny and the Diary Forgot It Was a Diary

A prompt packed with five explicit constraints seemed impossible to misread. One branch softened 'tiny' and quietly built a note-taking app instead of a diary.

constraint-driftproduct-identity-driftscope-inflation
EXP-0122026-03-13

Simple Meant Simple — But Only One Branch Got the Memo

Two agents built the same countdown timer from the same four-word prompt. Both worked. But only one treated 'simple' as a rule instead of a mood.

constraint-driftcountdown-timersimplicity
CONS-0012026-03-13

Three Agents, One Budget: Where Intent Discovery Agrees With Itself

We gave three independent agents the same five-word prompt and asked each to discover intent before building anything. They agreed on what a budget tracker is — and disagreed, revealingly, on what could go wrong.

consensusintent-discoveryreproducibility
REGEN-0012026-03-13

Two Agents, One Blueprint, Zero Disagreements

Two independent implementations built from the same intent artifact converged on every testable claim — and the most interesting convergence happened on the claim the artifact was least confident about.

regenerationconvergencepomodoro
SHUF-0012026-03-13

The Agent That Tried to Reconcile Two Products

We gave three agents the same habit tracker prompt. One got no intent artifact, one got the right one, and one got a Pomodoro timer's artifact presented as if it were legitimate. The third agent didn't reject the foreign artifact — it built a hybrid product and used the word 'reconcile' to describe what it did.

falsificationshuffled-exo
EXP-0012026-03-09

Tiny Meant Tiny — Until Implementation Started

A five-word prompt — 'Build a tiny pomodoro timer' — produced two very different interpretations of 'tiny.' One agent treated it as a visual aesthetic. The other treated it as a hard scope boundary. The difference happened before a single line of code was written.

greenfield-comparisonconstraint-driftscope-inflationsilent-default-selection
EXP-0022026-03-09

When Both Agents Got It Right — But for Different Reasons

Two agents built nearly identical grocery list apps for an elderly mother. Same features, same accessibility choices. But one called it 'an accessibility-first tool.' The other called it 'a small act of care expressed as software.'

greenfield-comparisonconvergenceuser-persona
EXP-0032026-03-09

The Journal That Became a Filing Cabinet

One agent built a note manager with search, edit, and delete. The other built a blank page that opens to today. Both were 'markdown journals for daily reflections' — but only one understood what a journal is for.

greenfield-comparisonpractice-to-tool-driftproduct-identity-drift
EXP-0042026-03-09

The Color That Doesn't Exist Yet

Both agents knew that screen colors can't predict how glazes will look after firing. One mentioned this in its developer notes. The other put a disclaimer in the UI. The difference between knowing a truth and telling the user.

greenfield-comparisonhidden-domain-truth-suppressionsurface-over-substance-drift
EXP-0052026-03-09

Right Features, Wrong Product

The prompt said 'budget tracker.' One agent built an expense tracker — add transactions, see totals — and called it a budget tracker. The other asked what 'budget' actually means and built around the answer.

greenfield-comparisonproduct-identity-driftsilent-default-selection
EXP-0062026-03-09

Twinkling Stars, Five Stories

One agent built a beautiful bedtime story app with twinkling CSS stars, a moon graphic, and animated gradients. It had 5 stories. The other built a plain dark screen with 12,000+ unique story combinations. The child never sees the screen.

greenfield-comparisonsurface-over-substance-driftconstraint-drift
EXP-0072026-03-09

The Agent That Quietly Built a Different Product

We gave two agents the same prompt for a todo app with markdown notes. One built exactly that. The other decided it was actually a notes app — and never mentioned the switch.

greenfield-comparisonintent-preservationplan-substitutionproduct-identity-drift