← All posts

Architecture

The Architecture Rules a Linter Can't Check

Every large codebase has rules that no linter knows about. This package must not import from that one. This business rule lives in exactly one place. The domain layer never touches the database directly. These rules live in code review comments, onboarding chats, and a few people's heads.

And they slip away quietly, one reasonable-looking PR at a time. Every single change looks fine on its own. The problem only exists in the relationship between things — an import that reaches somewhere it shouldn't, a rule that grew a second copy in a second place, a value that quietly took on a second meaning — and nobody reviews relationships. A reviewer sees one change, not the whole shape it's part of. I've written before about why code review can't close that gap on its own.

The usual answer to that is a fitness function: an automated check that guards an architectural property, so the rule fails a build instead of failing a memory. It's a good answer, and most teams stop well short of where it can actually reach. So the question I kept circling is how far it goes — which of these agreements you can really turn into a check, and which ones quietly resist it. That's what this post is about. The examples below come from running real rules against real code; it starts with the kind a machine obviously handles, and ends with the kind you'd assume it couldn't.

Part 1: the rule a linter could almost express

Cal.com is a large TypeScript codebase, and to their credit, they wrote their architecture rules down. Not in a wiki nobody reads, but in the agents/rules/ files they feed to their AI coding agents. Two of them matter here. The first, architecture-circular-dependencies.md, sets a strict order for which package can use which, and says, as rule 6, word for word:

No files in packages/app-store import from @calcom/features or ../features/**

The second, architecture-feature-boundaries.md, adds: "All cross-feature dependencies must go through the feature's public API." Both are good rules. app-store sits below features in their layers, so reaching up into it is backwards. The only question left is whether the code still follows them.

I turned rule 6 into a maat rule. Same boundary, written as code:

layer('@calcom/app-store')
  .forbids(/@calcom\/features\/[^/]+\/(lib|repositories|services)\//)
  .build()

And ran the check:

FINDINGS (16)
────────────
  [layer-imports]: 16 findings

  "@calcom/app-store" imports feature internals
  (forbidden by the rule). One example:

    ↳ app-store/_utils/payments/
        handlePaymentSuccess.ts
      → @calcom/features/bookings/lib/EventManager

Sixteen places where app-store reaches into the inner parts of features, against their own written rule. One file on its own, handlePaymentSuccess.ts, imports from twelve paths inside features, across four different feature areas (bookings, webhooks, platform-oauth-client, tasker).

Here's the part that makes Cal.com a good example and not a target: they don't just write the rule down, they enforce it. The boundaries doc even says so: "Domain boundaries are enforced automatically through linting." And it's true. Their Biome config has a rule for exactly this:

// biome.json, packages/app-store override
"noRestrictedImports": {
  "level": "error",
  "options": {
    "patterns": [
      { "group": ["../features/**"],
        "message": "app-store package should not
                    import from features ..." }
    ]
  }
}

So how did sixteen imports get past a rule set to error? Look closely at what it matches: ../features/**, the relative path. The imports that slipped through use the package alias instead: @calcom/features/bookings/lib/.... Same boundary, written a different way, and the linter only knew about one way. Here's the telling part. Rule 6, as they wrote it, names both forms (@calcom/features and ../features/**), and they clearly know how to block both, because in their testing config they list exactly that pair. The Biome rule for app-store just didn't carry over the full rule they had written. It's a gap, not a choice.

That's the whole problem in a nutshell, and it isn't really about any one tool. A text-matching check sees the string you typed; the boundary you actually care about is the dependency underneath it. Resolve the import to where it truly points and the alias and the relative path collapse into the same edge — but if you're matching strings, they don't, and one form sails through. The rule was written down, even partly enforced, and it still slipped through the gap between the two spellings.

This is the familiar end of the spectrum, though — the kind of rule you'd expect to be checkable, and roughly half a linter already gets there. I start here precisely because it's unremarkable: concrete, mechanical, and something anyone can verify for themselves. It's also why this is the part I ran on Cal.com — it's their rule, broken in their code, checkable against their own config. That's fair game. The next part I ran only on my own code, and I'll explain why.

Part 2: the rule a linter can't express

Here is a kind of agreement a normal checker can't catch, because nothing about the code looks wrong on the surface. A function returns the same value to mean two completely different things:

async function findUser(id: string): Promise<User | null> {
  try {
    const row = await db.query(id)
    if (!row) return null   // means: this user does not exist
    return toUser(row)
  } catch {
    return null             // means: the database call failed
  }
}

Both branches return null. But one means "no such user" and the other means "the database is down." Every caller has to just know which is which, and sooner or later one of them guesses wrong. No linter flags this. Every line is fine. The problem is the meaning, and meaning isn't something you can read off the syntax.

So the tool reads it instead. I pointed this check at my own codebase, and it found exactly that pattern:

[com-semantic] [Verify]
  "null" returned by resolvePackageInformation
  in collector-ts/src/dependencies.ts
  might mean two different things.

  Reason: returns null both when package.json
  does not exist (missing), and when reading or
  parsing it fails (operation failed). The caller
  cannot tell a missing package from a corrupted
  one. (confidence: 0.95)

That's a real case of confusing code I had written without noticing. The function is right here.

Now the honest part, because this is where most "AI for code" tools lose me. The second you let an LLM read your code, the tempting next step is to let it judge too — and the moment it judges, your check is probabilistic. It can flag something today and shrug at the same code tomorrow. A check that gives a different answer on Tuesday isn't a check; it's a mood. So the whole point of this part is the line drawn through it: the LLM is never allowed to decide. It only reads the code and reports what it saw. A plain, deterministic rule — one that gives the same answer every time for the same input — decides whether that's actually a problem. The reading can be fuzzy; the verdict never is.

And every finding that touched the AI is tagged [Verify]: it shows up for a human to confirm, it goes into the record as unverified, and it never fails your build on its own. The machine widens what you can see; it never gets a vote on what's true.

The machine reads. The rule judges. You verify. Same input, same output, every time.

It isn't expensive, either. That run checked the whole codebase and cost less than a cent. It remembers code that hasn't changed, so the cost follows how much you changed, not how often you run it.

This is also the part I chose not to run on Cal.com. Publishing an AI's guess about what someone else's code was meant to do is a cheap shot. Their structural rule is theirs, broken in their code, and that's fair. "An AI thinks your function is confusing" is a different thing, and I'm only comfortable saying that about my own work.

Part 3: what happens to a finding after you find it

Finding problems is the easy half. Every team already has a long list of known issues. The real question is what stops that list from quietly rotting, the same way documentation rots once nothing connects it to the code.

In most setups, you accept a problem with an ignore comment, and that comment lives forever. Nobody ever looks at it again. The exception itself is how things slip.

The fix isn't clever, it just has to be deliberate: an accepted exception should carry an expiry date. You accept a finding "for now," and "for now" is a real duration, not a figure of speech. When it runs out, the check fails again and someone has to look once more. The decision — who accepted it, when, and for how long — gets written to a plain file kept next to the code, so six months later nobody has to remember why an exception exists. The history says so. There is no permanent ignore, because "permanent" was never something anyone actually decided; it was just the default a silenced comment drifts into.

A linter tells you something is broken right now. The harder job is keeping the agreements and the decisions around them — and making sure the "we'll deal with it later" ones actually come back.

I run it on its own code

None of this is staged for a blog post. It runs in CI as a required check, next to the linter and the tests, and it has blocked my own pull requests. Here's one, from a few days ago:

Architecture check  ✗

FINDINGS (1)
  [cop-args] "BatchLLMRequest.executeBatch"
    has 4 params, but the limit is 3
    ↳ packages/utils/src/llm/batch.ts:36

Error: Process completed with exit code 1.

I'll say it before you do: that rule is trivial. Any linter with a max-parameters rule catches the same thing, and you don't need a history file to know four arguments is one too many. But that's exactly the honest point. The value isn't this rule. It's that the same machine flagging four parameters is the one running the boundary check and the meaning check, plugged into CI the same way, blocking merges the same way. It annoys me in my own pipeline as much as it would in yours — which is mostly why I trust the rest of what it says.

Why I built this

I've spent a career watching this exact pattern play out. A change that's correct in isolation, in a file that looks self-contained, quietly depended on by code somewhere else that never announced the connection — a dependency that isn't written down in any file, any type, or any test, only carried in whoever happened to be around when it was built. I've debugged it, reviewed it into existence, and inherited it more times than I can count.

And it isn't just unpleasant — though it is that. From a developer's chair it's some of the worst developer experience there is: you touch something reasonable and lose a day to a break three directories away, and after enough of those you stop trusting the code. Every change starts to feel like it might detonate something you can't see, so you make them smaller, slower, more defensively than the work actually needs — you dread changing code that, on paper, you own. From a step back it's money. Martin Fowler makes the case better than I can in Is High Quality Software Worth the Cost?: the cruft you can't see compounds quietly, every new feature ends up costing more than the last, and the bill lands on whoever pays for the time — not only the person writing the code. The frustration is the part the team feels; the slowdown is the part that shows up in the budget. That's most of why this exists.

The trap I set for myself

And then, while building the very tool meant to catch it, I walked straight into it.

maat had a bug. One file called process.chdir() to set the working directory. A dozen others (the cache, the git reader, the TypeScript reader, the ledger) quietly read process.cwd() and relied on whatever that first file had set. Of course removing the chdir line broke them — that part isn't interesting. Anything breaks if you delete what it secretly leans on.

The unsettling part is the other direction. Before I touched it, that connection existed in exactly zero places you could point to. No import tied those files together. No type encoded the dependency. No test asserted "this only works if someone set the cwd first." Search the codebase for the relationship and you find nothing, because it was never written down anywhere — it lived entirely in the gap between the files, and in my head. On a team of one that's a bad afternoon. On a team of thirty it's a fact that maybe one or two people happen to hold, until they leave, and then it's a fact nobody holds at all. The code keeps working and no one knows why, which is the same as no one knowing why when it stops.

I fixed it and added tests. But the irony stuck with me: I was building the thing meant to surface hidden connections like this, and I'd just strung one across my own codebase without noticing. Every file looked fine on its own. That's the whole point. The damage is never in the file you're looking at.

And to be honest, process.chdir is a soft target. A hidden global is the obvious kind of coupling — low-level, a bit of a smell, the sort of thing a careful reviewer might catch. I led with it because it's easy to picture. The expensive version lives higher up, in the business rules: a policy two services silently have to agree on, an amount calculated in one place and assumed in three, an invariant that only holds because a couple of people remember to keep it holding. Same shape — a connection nothing in the code admits to — just dressed as ordinary domain logic, where it's far harder to see and happens far more often.

The same trap, at machine speed

There's also a newer reason it matters more than it used to. For most of my career this kind of knowledge eroded slowly, at the speed people joined and left a team. AI changed the rate. A model can write and reshape more code in an afternoon than anyone on the team will read that week — and it does it without leaving behind the one thing that was never in the code to begin with: why. The implicit connection still gets made; it just gets made faster now, by something that won't be around to explain it later.

Margaret-Anne Storey has a sharp framing for this in From Technical Debt to Cognitive and Intent Debt. Technical debt lives in the code, and AI is genuinely getting good at paying that down. But two quieter debts grow in its place: cognitive debt, as the team's shared understanding of the system erodes, and intent debt, when the goals and rationale behind the code were never written down anywhere a human — or an agent — can consult. The code runs, nobody remembers why it's shaped that way, and the next change is a guess. I've written more about the AI-assisted version of this; the takeaway is the same one I keep landing on. The fix isn't smarter code, it's making the implicit explicit, on purpose, while someone still remembers it.

You could push back here: maybe none of this matters anymore, because the AI reads the code, not us. I don't buy it, and not only because of intent. The harder reason is that handing a rule to an agent is not the same as the rule holding. An agent can read your boundary, agree with it, and cross it anyway — and that isn't hypothetical, it's the first half of this post. Cal.com wrote their architecture rules into the very files they feed their coding agents, and sixteen imports still went the wrong way. Telling the machine the rule is a hope; a deterministic check is the proof. You don't trust that the agent followed the rule you defined — you verify it did, the same answer every time. The more of the code an agent writes, the more that gap matters, not less.

That process.chdir tangle from earlier I just refactored away by hand, and that closed the one instance. But the harder part — catching that kind of hidden connection before it bites — is something I still can't do automatically, and it's what I'm chipping away at. What you can do today, even without that, is the unglamorous thing: write the agreement down and keep it next to the code, so the next person — or the next agent — trips over it instead of rediscovering it through an outage. For this one, the record would look like:

maat axiom declare \
  --id "no-global-cwd" \
  --claim "Working directory is passed explicitly,
           never read from a global." \
  --note "Was an implicit process.chdir once.
          Removing it broke six files."

That's the whole idea. The rules your team agreed on shouldn't live only in someone's memory, waiting to be relearned through an incident. Write them down. Check the ones you can. Keep the receipts for the rest.

maat is open source and pre-1.0. If your codebase is big enough that not everyone holds all the rules in their head anymore, it has something to say. The code, and how to try it, are on GitHub.

← All posts