EXP-001: Tiny Meant Tiny — Until Implementation Started

Context

Five words: "Build a tiny pomodoro timer."

That's it. We wanted to see what happens when a prompt's most important word is also its most ambiguous one. "Tiny" could mean visually small, functionally minimal, low complexity, or all of the above. It's the kind of word developers use all the time without realizing how much interpretive room it leaves.

Two isolated agents got the same five-word prompt. One went through intent discovery first. The other just built.

Hypothesis

Our bet: the prompt-only agent would treat "tiny" as an aesthetic — a small, clean-looking app that's still feature-complete. The intent-driven agent, having formalized "tiny" as a protected value before coding, would treat it as a hard constraint on the feature surface.

In other words: "tiny" means different things depending on when you think about it.

Initial Intent Artifact

The intent artifact defined the product as a minimal countdown timer: 25-minute work sessions, 5-minute breaks. Nothing else.

It identified "tiny" as constraining not just the UI but the entire feature surface — the app should do one thing and stop. Protected values: minimalism, correctness, speed. Explicit exclusions: no long breaks, no settings panel, no task tracking, no notifications, no file splitting.

It even flagged the specific drift risk we were testing: "scope inflation through individually reasonable additions" and "reinterpreting tiny as visual rather than functional."

The artifact also included a formal state machine for timer behavior. That turned out to matter.

Method

Both agents got the raw prompt and nothing else. The intent-driven agent generated a structured intent artifact first, then implemented. Both produced single-file HTML apps and written summaries of their decisions.

Observation

What the prompt-only agent built

Long break cycles (15 minutes every 4th session). A skip button. Session progress dots. Browser notifications. A tab title countdown. Its own summary acknowledged these were "over-featured for tiny" — but the code shipped them anyway.

The implementation didn't resist its own instincts. The agent recognized the tension, noted it in writing, and then did the thing anyway.

What the intent-driven agent built

A 25-minute timer. A 5-minute break. Start, pause, reset. That's it.

But here's the part that surprised us: the intent-driven agent also caught a technical correctness issue that the prompt-only agent missed entirely. Browser tabs that get backgrounded cause `setInterval` to drift — the timer silently becomes inaccurate. The intent-driven agent identified this during intent discovery, traced it to a failure boundary in the artifact ("timer must not silently miscount"), and implemented wall-clock-based timing instead.

The prompt-only agent used naive `setInterval`. In a backgrounded tab, its "tiny" timer would quietly lose time.

Where they converged

Both chose dark themes. Both centered the timer with large typography. Both used 25/5 durations. Some choices are so strongly implied by the product type that constraint level doesn't matter — both agents arrived at the same answer independently.

Drift Analysis

Constraint drift (primary)

This is the cleanest example of constraint drift we've seen. The prompt-only agent preserved the *aesthetic* of "tiny" — the app looks minimal, clean, compact. But functionally, it built a full pomodoro cycle manager with five features beyond what the prompt requested.

"Tiny" slid from a scope constraint to a style preference. The word was honored in appearance and violated in substance.

Scope inflation (secondary)

Each added feature — long breaks, skip button, progress dots, notifications, tab title — is individually small and defensible. Any one of them is a reasonable "nice to have." But collectively they transform the product. You asked for a tiny timer. You got a full-featured pomodoro app that happens to look small.

This is the insidious part of scope inflation: no single addition feels wrong. It's the accumulation that changes the product.

Silent default selection (tertiary)

The 4-session long-break cycle is a real convention from the Pomodoro Technique. But the prompt didn't ask for long breaks at all. The prompt-only agent selected a domain-specific default and implemented it as though it had been specified, without surfacing that a choice had been made.

Legitimate Divergence

Not every similarity or difference here is meaningful:

Color scheme: both chose dark themes independently. The artifact didn't specify visual appearance beyond "minimal." Valid design freedom.
Layout approach: both centered the timer with large type. Strongly implied by the product type, not by the artifact.
Font choices and spacing: aesthetic decisions within the minimal constraint. Neither conflicts with any protected value.

The convergence on aesthetics is actually interesting. It suggests that some implementation choices are so product-implied that intent artifacts don't need to constrain them — and shouldn't.

Result

The hypothesis held. "Tiny" meant two different things to the two agents, and the split happened before any code was written.

The prompt-only agent treated "tiny" as an aesthetic constraint. It built a feature-rich app that looked small. The intent-driven agent treated "tiny" as a scope boundary. It built a genuinely minimal app and used the remaining attention to catch a real correctness bug.

That last part is worth sitting with. The intent-driven agent didn't just build less — it built better. With fewer features to implement, it had room to think about whether the timer would actually count correctly in a backgrounded tab. The prompt-only agent was too busy building long breaks and notification permissions to notice.

Principle

Constraint words are interpreted differently depending on when they're analyzed. During implementation, "tiny" slides toward "minimal-looking but full-featured" — the agent's instinct is to add, and the word doesn't stop it. During intent discovery, "tiny" becomes a scope boundary that actively prevents additions.

The earlier a constraint is formalized, the more effectively it resists drift.

Follow-Up

Would real users consider long breaks essential to a "pomodoro timer," or is that scope inflation? User testing would settle this.
Does the same pattern hold for "lightweight," "basic," or "simple"? Each constraint word probably has its own drift profile.
Does the wall-clock vs `setInterval` difference produce observable behavior in real browser usage? Worth benchmarking.

Limitations

Both agents used the same model family. The scope-inflation tendency might be model-specific rather than universal.
"Tiny" is genuinely ambiguous. Reasonable people could argue that long breaks belong in a pomodoro timer, even a tiny one. The drift classification assumes "tiny" means feature-minimal — that's the intent-driven interpretation, but it's not the only valid one.
The intent-driven branch received more structured input. The improvement could partly reflect context volume rather than structural alignment.
Single run per branch. The prompt-only agent might resist scope inflation on a different day.
The aesthetic convergence (dark theme, centered layout) could reflect model training bias rather than genuine product-type reasoning.