EXP-011: When Tiny Meant Tiny and the Diary Forgot It Was a Diary

Context

We gave two agents the same prompt: "Build a tiny, offline-only, single-file, no-dependencies personal diary that works on a Raspberry Pi Zero."

Five constraint words. One product identity term. No ambiguity about what "offline" means, no wiggle room on "single-file," no debate about "no-dependencies." If any prompt should survive implementation intact, it is this one. The constraint density is so high that there is almost nothing left for the agent to decide — or so we thought.

The question we wanted to answer: when a prompt is this explicit, does intent discovery still do anything useful? Or does the prompt itself function as a sufficient intent artifact?

Hypothesis

We expected both branches to converge. The prompt is so heavily constrained that there is little room for interpretation. If the branches diverge meaningfully, it would suggest that even explicit constraint words are not self-enforcing — that agents can soften or reinterpret them even when the words are unambiguous.

Our bet: the intent-driven branch would be marginally tighter, but the prompt-only branch would get the fundamentals right. The constraint density should act as its own guardrail.

Initial Intent Artifact

The intent artifact treated every constraint word as an invariant, not a preference. It set specific, measurable targets: file size under 10KB, DOM under 30 elements, time-to-interactive under 500ms on Pi Zero hardware.

More interestingly, it made a sharp product-identity claim: a diary is "chronological, date-anchored personal reflection tool" with "one entry per date." The date is the primary organizing dimension. It explicitly called out that drift toward a note-taking app or journaling platform would violate the prompt's meaning.

It also flagged six drift risks, including scope inflation ("mood tracking, word count statistics, streak counters, search bars") and constraint drift ("'tiny' is reinterpreted as 'small but feature-rich'"). Both of these turned out to be prescient.

One detail worth noting: the artifact set a trade-off posture that allowed visual minimalism and narrow feature sets. It gave the implementation explicit permission to be plain and spartan. This matters because it removes the pressure an agent might feel to "deliver value" through additional features.

Method

Two branches ran in separate sessions from the same prompt. The prompt-only branch received the raw prompt and built directly. The intent-driven branch ran intent discovery first, generated an intent artifact, then implemented from the artifact.

Both branches produced a single `index.html` file and a summary report. Comparison was analytical: we examined the code, the data model, the feature set, and each agent's self-assessment. File sizes and line counts were measured directly; Pi Zero performance claims are inferred from DOM structure and JS complexity, not measured on actual hardware.

Observation

Both branches got the easy constraints right

"Offline-only," "single-file," and "no-dependencies" survived in both branches without any drift. Both produced a single HTML file with inline CSS and JS, zero external resources, and localStorage for persistence. Both avoided CDN fonts, external scripts, and service workers.

This is expected. These constraints are binary — either the file makes a network request or it does not, either there is one file or there are multiple. There is no spectrum to drift along.

"Tiny" did not mean the same thing to both branches

The prompt-only branch produced a 257-line, roughly 6.5KB file. The intent-driven branch produced a 94-line, roughly 2.2KB file. The prompt-only branch is three times larger.

Both are small by any normal standard. But the prompt did not say "small." It said "tiny." The prompt-only agent interpreted "tiny" as a general posture — keep things minimal, avoid bloat — and then added features it judged to be low-cost: search, delete, entry count display, a Ctrl+Enter shortcut. Each addition is individually reasonable. Together they tripled the file size.

The intent-driven branch treated "tiny" as an invariant with a measurable target. The intent artifact set a 10KB ceiling and a 30-element DOM limit. The implementation came in at 2.2KB and 15 elements. There was no pressure to add features because the artifact explicitly permitted a narrow feature set.

The prompt-only agent was aware of this tension. In its own summary, it wrote: "Search feature is arguably scope inflation. The prompt said 'tiny' and 'diary.' A search box is useful but was not requested." It added the search anyway, reasoning that it was "nearly zero cost to implement." That reasoning is correct in isolation but misses the cumulative effect: each "nearly zero cost" addition moves the product further from "tiny."

The diary quietly became a note-taking app

This was the finding we did not expect.

The prompt-only branch organizes entries by timestamp. Each save creates a new entry keyed by `Date.now()`. You can write multiple entries per day. Entries appear in a scrolling list, newest first, with a search box to filter them.

The intent-driven branch organizes entries by date. Each day has exactly one entry, keyed by `diary-YYYY-MM-DD`. You navigate between days with previous/next buttons. Saving overwrites the current day's entry.

These are two different products. The first is a timestamped entry list with search — structurally, a note-taking app. The second is a date-anchored diary — one page per day, just like a physical diary.

The prompt said "personal diary." A diary is organized by date, not by save event. You do not get multiple pages for the same day in a paper diary. The prompt-only branch did not deliberately choose to build a different product; it simply defaulted to a familiar pattern (timestamped entries in a list) without asking whether that pattern matches what "diary" means.

The intent artifact caught this before implementation began. It defined a diary entry as "a single piece of personal reflective text anchored to a specific date" and called out that "entries organized by title instead of date, tagging system, folder structure, search-first navigation" would be signs of product-identity drift.

Error handling diverged on a critical dimension

The intent artifact specified two failure boundaries: show a clear message if localStorage is unavailable (FB1), and inform the user if storage quota is exceeded (FB2).

The intent-driven branch implements both. If localStorage is blocked, the user sees "localStorage is not available. Entries cannot be saved." If a save fails due to quota, the error is surfaced.

The prompt-only branch wraps localStorage access in try/catch blocks that silently return empty arrays on failure. If localStorage is unavailable, the diary appears to work — you can write entries, click save — but nothing persists. The user would not know their diary entries are being discarded.

For a product called "personal diary," silent data loss is arguably the worst possible failure mode. The prompt-only agent did not address this, not because it was unaware of the risk (it mentions localStorage limits in its summary), but because the failure boundary was never made explicit as a design obligation.

Drift Analysis

Primary: Scope inflation

The prompt-only branch adds search, delete, entry count display, and a keyboard shortcut. None were requested. The agent's own summary identifies search as "arguably scope inflation" but includes it anyway. Each feature is individually small, but their cumulative effect moves the product from "tiny" to "small-with-features."

Secondary: Constraint drift

"Tiny" is softened from an invariant to a general posture. The prompt-only branch is three times larger than the intent-driven branch. The constraint word survives in spirit but not in its strongest reading.

Tertiary: Product-identity drift

The timestamp-keyed, multi-entry-per-day data model is structurally closer to a note-taking app than to a diary. This drift is subtle — the agent uses diary language throughout, and the UI looks diary-like — but the underlying product model does not match diary semantics.

Legitimate Divergence

Several differences between the branches are valid design choices in areas the prompt does not constrain:

Font family. The prompt-only branch chose Georgia (serif) for a "diary aesthetic." The intent-driven branch chose system-ui (sans-serif). Both are reasonable; the prompt says nothing about typography, and neither choice conflicts with any constraint.

Background color. Warm off-white vs plain white. Pure aesthetic preference.

Save shortcut. Ctrl+Enter vs Ctrl/Cmd+S. Both are conventional; neither adds meaningful complexity.

Date display format. Human-readable ("Sun, 13 Mar 2026") vs ISO ("2026-03-13"). The intent artifact did not specify a format.

Convergence finding. Both branches independently chose localStorage, explicit save (not auto-save), the same page title ("My Diary"), and no rich text. This convergence suggests these are natural defaults for the prompt rather than artifacts of shared framing.

Result

The hypothesis was wrong. We expected convergence on a constraint-heavy prompt, and instead found meaningful divergence on every soft dimension: file size, feature set, data model, and error handling.

The hard constraints — offline-only, single-file, no-dependencies — survived intact in both branches. These are binary constraints with no spectrum to drift along. But the soft constraints — "tiny" and "personal diary" — were reinterpreted by the prompt-only branch even though their meaning seems obvious.

The most telling detail is that the prompt-only agent knew it was drifting. It acknowledged search as "arguably scope inflation" and delete as "an assumption." It was not unaware of the risks; it simply had no structural reason to resist them. The intent artifact provided that structural resistance by treating each constraint as an invariant and giving explicit permission for a narrow feature set.

The strongest single-sentence finding: constraint words do not enforce themselves, even when the prompt is unambiguous — an agent needs a reason not to add things, and the prompt alone does not provide one.

Principle

Constraint words are descriptions, not enforcement mechanisms. A prompt can say "tiny" and mean it, but without a structural artifact that treats "tiny" as an invariant with measurable boundaries, the word becomes a suggestion. Intent discovery converts constraint words from descriptions into obligations.

A stronger formulation: the more constrained the prompt appears, the more likely it is that the constraints will be softened during implementation, because the agent compensates for the apparent narrowness by adding features it considers low-cost. Intent discovery interrupts this compensation pattern by making the narrowness explicit and permissible.

Follow-Up

Would a file-size linter catch the divergence? If we added a post-implementation check ("is the file under 10KB?"), the prompt-only branch would pass. The intent artifact's tighter target (under 10KB actual, 2.2KB achieved) suggests that automated constraint checking needs artifact-level targets, not just prompt-level keywords.

Does the data-model drift matter in practice? The timestamp-keyed model works fine for casual use. Does the date-anchored model actually produce a better diary experience, or is it an arbitrary design choice? Testing with real users on a Pi Zero would answer this.

What happens with 500 entries? The prompt-only branch renders all entries as DOM nodes and filters on every keystroke. On a Pi Zero with hundreds of entries, this would be the first real performance test of the two approaches.

Is this a constraint-density effect? The prompt has five constraints. Would a prompt with two constraints show less drift, or does the number of constraints not matter?

Limitations

Single run per branch. Each branch was implemented once. A different agent run could produce different results. The prompt-only branch might produce a date-anchored diary on a second attempt.

Same model family. Both branches used the same underlying model, which means shared training biases could explain convergence patterns.

Analytical comparison, not measured. Pi Zero performance claims are inferred from DOM structure and code complexity, not measured on actual hardware. The prompt-only branch might perform adequately in practice despite the unbounded DOM.

File size is approximate. Sizes were estimated from line counts and code density, not measured with `wc -c`. The 3x ratio is directionally correct but not precise.

The "diary vs notes" classification is debatable. A reasonable person could argue that a timestamped entry list is still a diary — some people do write multiple diary entries per day. The intent artifact's "one entry per date" model is one valid interpretation, not the only one. We classified this as drift because the prompt said "diary," which conventionally implies date-anchored entries, but we acknowledge this is an interpretive judgment.

Context volume confound. The intent-driven branch received more context (the intent artifact) than the prompt-only branch. Some of its tighter constraint adherence could be attributed to having more text to align to, not specifically to the structural properties of that text.