Home / Blog

2026-03-26byRachid — Kaiari Labs

Your AI Can't Code — And It's Your Fault

exogenesisintent-driftopinion

You typed five words into Claude, Cursor, or Copilot and got back a working app. Congratulations — it's the wrong app. Here's why, and what 16 experiments taught us about the gap between what you asked for and what you got.

The app works. The app is wrong.

You typed "Build a tiny pomodoro timer" into your favorite coding agent. Thirty seconds later you have a working app. Dark theme. Centered clock. Start button. Clean code.

You also have long break cycles, a skip button, session progress dots, browser notifications, and a tab title countdown. You asked for tiny. You got a full-featured pomodoro app that *happens to look small*.

And here's the part that should bother you: the agent knew. In its own summary, it acknowledged the result was "over-featured for tiny." It wrote that down. Then it shipped the code anyway.

This is not a hallucination. This is not a bug. This is the normal behavior of a coding agent doing exactly what it was designed to do — fill in the blanks you left empty.

The problem is: you left *all* the blanks empty.

You are the bottleneck

The discourse around AI coding tools follows a predictable script. "AI can't code" say the skeptics. "AI is amazing" say the enthusiasts. Both are wrong in the same way — they're treating the AI as the protagonist.

It's not. You are.

When you type a five-word prompt and hit enter, you're not giving the agent an instruction. You're giving it a *void*. The agent has to decide what "tiny" means, what "simple" means, what "budget tracker" means, what "for my elderly mother" implies. And it will decide. Confidently. Silently. Without telling you it made a choice.

We ran 16 controlled experiments to prove this. Same prompt, two branches: one where the agent just builds, one where intent is made explicit before coding starts. The results were not subtle.

The taxonomy of silent decisions

Here's what we found agents doing when you give them room to improvise:

They redefine your words. You said "tiny." They heard "minimal-looking." You said "simple." They heard "clean UI." The constraint word that was supposed to limit scope became an aesthetic preference instead. We saw this in experiment after experiment — a prompt packed with five explicit constraints, and the agent still softened "tiny" from a rule to a mood.

They build the wrong product and name it correctly. A prompt that says "budget tracker" gets you an expense tracker. The feature set looks right. The data model looks right. But the product is conceptually organized around the wrong primary object. There's no budget ceiling, no spending target, no planning — just a ledger. Right features, wrong product.

They flatten your priorities. You wrote "workout tracker that also tracks meals." The word "also" is doing all the work in that sentence — it's subordinating meal planning to workout tracking. The agent saw two domains and gave them equal weight. Your hierarchy evaporated.

They optimize what they can see. A bedtime story generator prompt produced two apps. One had twinkling CSS stars, animated moon gradients, and five hardcoded stories. The other had a plain dark screen and 12,000+ unique story combinations. Remember: the child is listening, not looking at the screen. One agent optimized for the demo. The other optimized for the child.

They turn practices into tools. A "daily reflection journal" became a note manager with search, edit, and delete. A journal is a blank page that opens to today. A note manager is a filing cabinet. Both are "markdown journals for daily reflections." Only one understands what reflection is.

They know things they don't tell you. Both agents building a ceramics studio palette knew that screen colors can't predict how glazes look after firing. One mentioned this in developer notes. The other put a disclaimer in the UI. The difference between knowing a domain truth and actually telling the user matters — and the agent won't surface it unless something forces it to.

They lose emotional context. The prompt said "she keeps forgetting her medications." One agent built a medication management tool — dosage schedules, refill reminders, compliance tracking. The other heard a child worrying about their mother and built a remembering aid with warm language and large, simple buttons. Same prompt. Completely different products. The first one is correct. The second one is *right*.

The most dangerous failure mode

All of the above are variations of one deeper problem: plan substitution.

Here's how it works. You give the agent a prompt. The agent interprets the prompt and generates an internal plan. The plan is slightly different from what you meant. Then the agent executes the plan faithfully, producing a coherent, polished, working app that is *not the product you asked for*.

We caught this most clearly in an experiment where the prompt said "todo app with markdown notes." One agent silently decided it was actually a notes app — and reframed todo items as "checkbox rendering in markdown." The todo functionality became a feature of the notes app instead of the other way around. The agent never mentioned the switch. The app worked perfectly. It was just the wrong product.

This is why "it works" is not a sufficient test. A working app that has been silently reframed through plan substitution is harder to catch than a broken one, because everything you see is internally consistent. The code is clean. The features hang together. The only thing missing is the thing you actually wanted.

It's not just vibes — the numbers

In 16 experiments, the prompt-only branch exhibited measurable drift in the majority of cases. But more interesting than the failures are the specific patterns:

We also ran a consensus experiment — three independent agents doing intent discovery on the same prompt. They agreed 100% on what the product is. They agreed only 11% on what could go wrong. Core meaning is stable. Failure imagination diverges.

And when we gave an agent the wrong intent artifact on purpose — a pomodoro timer's blueprint for a habit tracker prompt — it didn't reject it. It built a hybrid product and used the word "reconcile" to describe what it was doing. The agent tried to make two contradictory inputs coexist rather than flag the contradiction.

So what do you actually do about it?

Stop blaming the model. Start blaming the input.

The fix is not better prompting. "Prompt engineering" is a symptom patch — it optimizes the words you use without addressing the fact that most of your intent was never stated.

The fix is intent manifestation before implementation. Before the agent writes a line of code, the following should be explicit:

  • What the product is — not what it does, but what it *is*
  • What must not be sacrificed — the protected values that trade-offs can't touch
  • What failures are unacceptable — not bugs, but conceptual failures
  • What words are load-bearing — "tiny," "simple," "also" — and what they actually mean in this context
  • What the agent must not decide silently — the choices that need to be surfaced, not inferred

This is what we call Exogenesis — an intent-first method for software generation. It's not a framework. It's not a plugin. It's a way of structuring the conversation between you and the agent so that the agent's plan starts from your meaning instead of its own inference.

We've shown that this works. In experiment after experiment, the intent-driven branch preserved product identity, caught edge cases the prompt-only branch missed, and treated constraint words as rules instead of moods. A regeneration experiment showed that two independent agents given the same intent artifact converge perfectly — even on claims the artifact was uncertain about.

The uncomfortable conclusion

Your AI coding tool is not broken. It's doing exactly what you asked — which is almost nothing.

Five words is not an instruction. It's an invitation for the agent to build whatever it thinks you might have meant. Sometimes it guesses right. Sometimes it builds a filing cabinet when you wanted a journal, an expense tracker when you wanted a budget planner, a medication compliance system when a child just wanted their mom to remember.

The gap between what you typed and what you meant is where every silent decision lives. Every product reframing. Every constraint reinterpretation. Every feature that seemed reasonable individually but collectively turned your tiny timer into a feature-complete pomodoro app.

You can close that gap. But it requires you to do something most developers resist: *think about what you want before you ask for it.*

The AI can code. It's waiting for you to tell it what to build.