Exogenesis / Field Notes
Field Notes
Each experiment follows the same protocol: define a scenario, run a baseline with no intent artifact, run again with Exogenesis, and measure the difference.
16 experiments
Latest experiment
The Word 'Also' Did a Lot of Heavy Lifting
A one-sentence prompt with two domains — workout tracking and meal planning — reveals how agents silently equalize features that the user subordinated with a single word.
Four Words, Six Forbidden States, and the $0.00 That Shouldn't Be There
A four-word prompt produces the same product from both branches — but the intent artifact catches edge-case violations the prompt-only agent honestly admits it never considered.
The Reading List That Became a Book Tracker
Two agents built a reading list app from the same five-word prompt. One quietly narrowed the product to a book-tracking checklist; the other preserved the broader concept of a reading queue with progression states. The most telling difference was not in features but in which decisions were made silently.
She Keeps Forgetting — And So Did the Agent
When a prompt carries a child's worry about their mother, one branch built a medication management tool. The other built a remembering aid for an elderly parent. The difference came down to whether 'she keeps forgetting' was read as a feature request or a feeling.
When Tiny Meant Tiny and the Diary Forgot It Was a Diary
A prompt packed with five explicit constraints seemed impossible to misread. One branch softened 'tiny' and quietly built a note-taking app instead of a diary.
Simple Meant Simple — But Only One Branch Got the Memo
Two agents built the same countdown timer from the same four-word prompt. Both worked. But only one treated 'simple' as a rule instead of a mood.
Three Agents, One Budget: Where Intent Discovery Agrees With Itself
We gave three independent agents the same five-word prompt and asked each to discover intent before building anything. They agreed on what a budget tracker is — and disagreed, revealingly, on what could go wrong.
Two Agents, One Blueprint, Zero Disagreements
Two independent implementations built from the same intent artifact converged on every testable claim — and the most interesting convergence happened on the claim the artifact was least confident about.
The Agent That Tried to Reconcile Two Products
We gave three agents the same habit tracker prompt. One got no intent artifact, one got the right one, and one got a Pomodoro timer's artifact presented as if it were legitimate. The third agent didn't reject the foreign artifact — it built a hybrid product and used the word 'reconcile' to describe what it did.
Tiny Meant Tiny — Until Implementation Started
A five-word prompt — 'Build a tiny pomodoro timer' — produced two very different interpretations of 'tiny.' One agent treated it as a visual aesthetic. The other treated it as a hard scope boundary. The difference happened before a single line of code was written.
When Both Agents Got It Right — But for Different Reasons
Two agents built nearly identical grocery list apps for an elderly mother. Same features, same accessibility choices. But one called it 'an accessibility-first tool.' The other called it 'a small act of care expressed as software.'
The Journal That Became a Filing Cabinet
One agent built a note manager with search, edit, and delete. The other built a blank page that opens to today. Both were 'markdown journals for daily reflections' — but only one understood what a journal is for.
The Color That Doesn't Exist Yet
Both agents knew that screen colors can't predict how glazes will look after firing. One mentioned this in its developer notes. The other put a disclaimer in the UI. The difference between knowing a truth and telling the user.
Right Features, Wrong Product
The prompt said 'budget tracker.' One agent built an expense tracker — add transactions, see totals — and called it a budget tracker. The other asked what 'budget' actually means and built around the answer.
Twinkling Stars, Five Stories
One agent built a beautiful bedtime story app with twinkling CSS stars, a moon graphic, and animated gradients. It had 5 stories. The other built a plain dark screen with 12,000+ unique story combinations. The child never sees the screen.
The Agent That Quietly Built a Different Product
We gave two agents the same prompt for a todo app with markdown notes. One built exactly that. The other decided it was actually a notes app — and never mentioned the switch.