Architecture
The Coupling Rate Problem
The previous post argued that implicit coupling is mostly a maintenance problem, not a generation problem. Two agents on the same codebase found almost nothing when the task was additive, and surfaced most of the coupling when the task was structural. Both performed well on the harder task. The difference was not model quality. It was whether the work forced reconciliation.
That changes the next question. If refactoring is the kind of work that forces coupling into view, how often are teams still doing that work?
GitClear's data suggests: less often. Code cloning is rising. Refactoring is falling. That means coupling is being created faster than teams are being forced to confront it.
The shift in work mix
GitClear analyzed over 211 million changed lines of code between 2020 and 2024, across real repositories. Two numbers from that analysis:
- Cloned code blocks: 8.3% → 12.3% (+48%)
- Refactored code: 25% → under 10% (roughly −60%)
Those numbers matter together. Code cloning produces coupling directly, the same logic in two places with no shared abstraction, no contract, nothing a compiler can follow. Refactoring is the activity that discovers that coupling: when you consolidate four deletion paths into one, you are forced to reconcile every difference between them. That's when the hidden inconsistencies surface. That's what that experiment showed.
When code cloning rises and refactoring falls simultaneously, coupling is being created faster and discovered less often.
What this looks like in a real codebase
I've been doing architectural analysis on production systems, large codebases with years of active development and real teams. Two patterns from that work illustrate what the GitClear data looks like up close.
The first: a central function, the kind every backend has, one that creates an entity, triggers side effects, and dispatches events. Its first line:
/* eslint-disable complexity */
Below that, it imports directly from five distinct layers, persistence, domain services, event infrastructure, integration, and the framework. Nine positional parameters, including two adjacent booleans that are indistinguishable at the call site. Orchestrator, repository, event dispatcher, and validator in one function.
The eslint-disable on line 1 is a fossil. It marks the moment someone
silenced the only tool that would have triggered a conversation about restructuring.
After that, TypeScript sees valid types, the tests pass, and nothing blocks what
accumulates next. The coupling grew one PR at a time.
The second pattern is subtler. The codebase has a hook system for dispatching events and transforming data, with registrations spread across many files. The dispatch key is a string, no import, no static reference, no type the compiler can trace. A rename in the producer doesn't surface as an error in the consumer at compile time. A developer working across three open files might miss it. An agent with the full codebase in context is better positioned to catch it, but that's the smaller risk. The deeper issue is architectural: there's no alarm when a handler silently stops running.
The metrics report every handler as executed. The system keeps working. If a handler stops running because a hook key was renamed, or a module was loaded in a different order, the data slowly becomes wrong. Nothing flags it.
These patterns weren't introduced in a single large PR. Each addition was localized, sensible in context, and passed every automated check. The coupling is the sum of decisions that individually looked fine.
Why the task shape matters here
In that experiment, two agents worked on the same brownfield codebase with two different tasks. The additive task, adding a hook across four deletion paths, produced zero bug discoveries. The structural task, consolidating those four paths into a shared pipeline, surfaced most of them, and both agents performed well. The difference wasn't model quality. To consolidate four paths into one, you have to understand what makes each path different. The task left no way around it.
Refactoring is that kind of task. It forces reconciliation. When you pull scattered logic into an abstraction, you discover that the scattered copies weren't actually identical, that some paths had a check others didn't, that a field was being set in one place and left unset in another. That discovery is a side effect of the work, not an explicit goal. It happens because the task has no way to proceed without it.
When refactoring is down 60%, that reconciliation is happening 60% less often. The coupling doesn't disappear. It accumulates until a task that can't avoid it comes along.
The compounding factor
Both agents in the greenfield experiment built freely, with no architecture instructions. Both landed on the same macro pattern: one large service class, types in a separate file, no real separation of concerns. That wasn't a failure of capability. With explicit architectural guidance, the result might have been different. But neither had it.
Code generated without architectural constraints tends toward concrete, flat designs.
Not because the agent is making a bad decision, but because flat is always the
simpler immediate answer. When another agent extends that code, it extends the
pattern it finds. Claude's nested MonetaryAmount absorbed a new field
without new code. Codex's flattened version forced two parallel changes and produced
exchangeRateRate. The design quality at the start determined the cost
of every extension after.
This is what the GitClear data looks like from the inside. Code cloning is up 48%. Keeping architectural context accurate across a growing codebase is hard. Even when decisions are documented from the start, the system outpaces the documentation. New patterns emerge, old decisions stop reflecting what the code has become. The context degrades, and agents increasingly operate without it. Each cloned block tends toward the flat design. Each subsequent extension deepens whatever pattern is already there. The surface area that a future structural task will have to reconcile grows faster than it used to.
Code cloning accelerates the creation of coupling. Skipping refactoring defers the discovery. The gap between the two is what accumulates silently.
What the data doesn't say
The GitClear data doesn't say codebases are broken today. It describes a trajectory: more code is being generated, more of that generation results in duplicate blocks, and less of the total work is the structural kind that forces teams to confront what accumulated. The coupling grows. The reconciliation is deferred.
The coupling isn't harder to find when someone looks. That experiment showed agents can surface most of it when the task forces them to. The question is how often that task appears, and whether, by the time it does, the surface area has grown to a point where the cost of reconciliation has changed.
Refactoring was never the popular choice. It competes with feature work, rarely appears on a roadmap, and its benefit is invisible until something breaks. But it was doing something structural: forcing the reconciliation that reveals what accumulated silently. The tools got faster. The task that found the coupling got rarer. And each cycle without it leaves more for the next one to inherit.