Why you need architectural fitness functions when vibe coding

If you've spent any time vibe coding, or working with AI-assisted development more broadly, the architectural picture probably feels different than it used to. Code arrives faster than it can be reviewed. Modules grow, boundaries shift, dependencies appear, and most of it happens between prompts rather than between meetings.
Is this bad? Not necessarily — a different engineer would also implement things differently. The problem isn't that the AI makes bad choices. It's that choices are being made at a cadence the usual feedback loops weren't designed to handle. Pull request review, design discussions, and shared mental models all assume a slower rate of change.
Architectural fitness functions are one of the few mechanisms that operate at the right speed. They are small, automated, executable checks that verify whether the system still respects the properties the team cares about — coupling, layering, dependency direction, performance budgets, security boundaries. They run on every commit, and they don't depend on anyone remembering why a rule exists.
There are three reasons this matters more than it used to.
Architectural decisions are being made without being recognized as decisions
Every prompt produces code, and most of that code makes architectural choices along the way. Where does this function live, what does it depend on, does it cross a boundary that was supposed to stay closed. Historically, these decisions triggered a moment of friction — a pull request, a reviewer, a memory of why the boundary existed.
In a vibe-coding workflow, that friction is gone. The code appears, the tests pass, and the choice is locked in before anyone identifies it as a choice. None of these decisions feel architectural in isolation. Together, they shape the system as much as any design document.
This is how the most expensive architectural debt is created. Not from bad decisions, but from decisions that were never recognized as decisions in the first place.
Fitness functions surface them. A check that fails when the order module imports from billing isn't a rule for the sake of rules — it's the moment of friction the AI doesn't provide, made automatic.
The AI architects against no horizon at all
Architectural decisions don't require seeing the whole future. They require seeing the next future — the work already on the board.
A human developer doesn't need a long-term vision to make a good local choice. They need to know that next sprint adds multi-tenancy, that the payments module is being extracted in two weeks, that the team is migrating off the current ORM by the end of the quarter. With that horizon, the local decision tends to fall out on its own.
The AI has none of that. Its planning horizon is exactly one prompt long. It optimizes against a generic system at a single point in time, and the result is the answer that looks plausible in isolation and conflicts with what's shipping next week.
Fitness functions encode the horizon the AI doesn't have. The constraint that the order module must not depend on infrastructure, or that p95 latency on the checkout path must stay under 200ms, is a property the build refuses to accept the violation of. The model doesn't need to know the roadmap; it only needs to encounter the rules that exist because of it.
Drift now happens at the cadence of prompts, not releases
Architectural drift used to be measured in months. Vibe coding compresses that timeline. A single afternoon can introduce more architectural change than a full sprint of manual coding, and each change feels small enough to pass without comment. The forces driving drift are unchanged — time pressure, evolving environments, knowledge gaps. What's new is that they operate per prompt rather than per sprint.
The mechanism that used to absorb some of this — human code review — degrades precisely when it's needed most. AI-assisted merge requests tend to be larger, and large merge requests don't get reviewed; they get skimmed. A reviewer who would have caught a boundary violation in a fifty-line change won't catch the same violation in six hundred lines.
Fitness functions turn drift into a signal the team can act on. The breach is discovered in the commit that caused it, not six months later when fourteen places need fixing. That's the difference between debt managed deliberately and debt discovered accidentally.
From rules to executable intent
It is tempting to think of fitness functions as guardrails for the AI. They are something more useful: the executable form of architectural intent.
The model that generates the code won't be the developer who reviews it, won't be the team that maintains it, won't be the operator who runs it in production. The only thing that persists across all of those is the system's own checks. If architectural intent doesn't live in something executable, it tends not to live anywhere at all.
This is why fitness functions pair so naturally with ADRs. ADRs capture the reasoning. Fitness functions enforce the consequence.
Conclusion
Vibe coding isn't the problem. The practices that used to keep architecture coherent were calibrated for a world where humans wrote every line, and that world is gone. The feedback loops have to catch up.
Architectural fitness functions are one of those feedback loops. They make silent decisions visible, encode the horizon the model doesn't have, and turn drift into a signal the team can act on. They aren't a brake on AI-assisted development — they are what makes it safe to keep going at the speed it offers.