Surviving a Growing Agentic Codebase

My vibe coding is getting out of hand, and my current project’s code base is getting large enough I started running into growing pains. Here’s a few insights that help when you get to that point using Claude Code.

Most teams don’t notice the drift until session twelve. The code compiles. The tests pass. But the thing being built has quietly diverged from what was specified — because the AI that wrote it had no idea what the last three sessions decided.

That’s not a model failure. It’s a memory architecture failure. And there’s a number of tactics you can apply to counter this effectively — but don’t expect it to eliminate drift completely, you still have to check your code!

Here’s what actually happens: you open a new Claude session, paste in some context, and start building. The model is sharp, responsive, confident. It generates solid code. You close the session. The next day, you open a new one. You paste some context — maybe slightly different, slightly less complete. The model makes a few reasonable assumptions to fill the gaps. The code still works. The direction is now 4 degrees off from where you were.

Do this for six weeks and you’ve built something technically coherent that is strategically wrong.

The good news: this is a solved problem. The discipline is documentation-as-infrastructure. Not documentation as afterthought.

The Core Insight Most Builders Miss

The agent doesn’t have memory. Your file system does.

That’s the reframe everything else depends on. The model is not a colleague who remembers Tuesday’s conversation. It’s an extremely capable contractor who starts fresh each morning and needs explicit, structured briefing before touching anything. Your job is to make that briefing instantaneous, accurate, and tamper-resistant.

This means treating your docs directory as the brain — and the AI as the hands.

The Four-Layer Requirements Stack

Stop using a single README to carry all the context. Sprawling markdown files are context noise at scale. The alternative is a deliberate stack, with each layer serving a different time horizon.

CLAUDE.md — Layered Context

This isn’t a single file — it’s a hierarchy. Claude Code reads CLAUDE.md files in sequence: first from your home directory (~/.claude) global defaults that apply across every project), then from your project root (project-wide architecture rules and mandates), then from individual subdirectories (module-specific constraints that override or extend the layers above).

That’s more powerful than a single constitution file, and more nuanced. Your home-level CLAUDE.md might encode how you want the model to handle ambiguity, request clarification, and structure commits — behaviors you want everywhere. Your project-level CLAUDE.md encodes the system’s actual constraints: security mandates, off-limits patterns, non-negotiable architecture decisions. A subdirectory CLAUDE.md for your payments module might add compliance-specific rules that shouldn’t burden the rest of the codebase with noise.

The practical implication: design the hierarchy intentionally. What’s truly global goes up top. What’s project-specific stays in the root. What’s module-specific stays scoped. And across all three layers — keep it short. A bloated CLAUDE.md is context noise masquerading as guardrails.

PRD.md — The Product Contract

Full requirements: goals, user personas, feature list, non-goals, constraints, success criteria. This document is append-only. Requirements are never deleted — they’re deprecated, with a comment explaining why and when. That discipline sounds bureaucratic until the fourth time a model reinvents something you already decided against.

The deprecation log is institutional memory. It tells future-Claude (and future-you) not just what the system does, but what it was explicitly decided not to do. That’s the high-value information that normally lives in someone’s head and disappears when the context window closes.

PLAN.md — The Phased Build Map

Breaks the PRD into numbered phases and tasks. Each task has three things: a status (pending / in-progress / done / blocked), acceptance criteria, and explicit dependencies. Claude updates this file at the end of every session.

This is the handoff document between sessions. The goal is that any new session can read PLAN.md in 60 seconds and know exactly where the project is, what was just completed, and what needs to happen next — without requiring any additional verbal context from you.

MEMORY.md — Living State Snapshot

What was built. What changed. What was decided, and why. What’s deferred and for what reason. Claude rewrites a section of this at task completion. It’s the engineering decision log plus status board.

The “why” entries are the most important. When a model encounters a pattern that looks improvable, the MEMORY.md entry explaining why that pattern exists is the artifact that surfaces the constraint.

One honest caveat: none of these documents are hard guardrails. Models can misread them, deprioritize them under heavy context load, or simply fail to connect a documented constraint to the specific decision in front of them. MEMORY.md and CLAUDE.md shift the odds significantly — they don’t eliminate the failure mode. You still need to spot-check outputs against your documented intent, especially on complex multi-step tasks. Treat these files as high-quality briefing documents, not as enforcement mechanisms.

The Session Protocol That Eliminates Ghost Context

Ghost context is what I call the failure mode where a model invents assumptions about project state because it wasn’t given enough real context. The invented assumptions are usually plausible. They’re also usually wrong.

The fix is a mandatory session-open prompt structure:

Read CLAUDE.md, PRD.md, PLAN.md, and MEMORY.md in that order.Summarize the current state of the project and identify the next pending task.Ask any clarifying questions before proceeding.

This takes 60 seconds. It eliminates an entire class of drift failure. The model is forced to surface its assumptions before it writes a line of code rather than discovering them during a debugging session two weeks later.

If you’re using Claude Code, this becomes a project-level command. Run it at the top of every session without exception.

Preventing Requirement Drift: The Three Guards

Drift isn’t a single event — it’s a slow accumulation of small divergences, each individually defensible, collectively fatal to coherent product design.

Scope gates in PLAN.md

Every phase has an explicit scope boundary section listing what is not in scope. The model reads this before implementing anything in that phase. The negative space is as important as the positive. If you don’t specify what’s excluded, a capable model will include it because it seems helpful and adjacent.

Change control in PRD.md

When a requirement changes, it’s never silently edited. Old text is struck through with a comment —  — and the new requirement follows. The model can see the evolution of thinking. It knows this isn't the first time this question was considered. That context changes how it implements.

Silent edits to requirements documents are how you build a system that makes internal sense but contradicts a decision you made six weeks ago and can’t remember making.

Acceptance criteria as test specs

Every requirement gets a corresponding acceptance criterion in testable form: “Given X, when Y, then Z.” The model’s test-generation sub-agent derives test cases directly from these. If the test fails, the requirement wasn’t met. No interpretation required.

This converts requirements from prose that can be read multiple ways into specifications that either pass or don’t.

Forcing Ambiguity to the Surface Before It Becomes Code

The most expensive drift problem isn’t a model that ignores requirements. It’s a model that encounters an underspecified requirement and makes a reasonable architectural decision to fill the gap — silently, mid-task, while you’re doing something else.

You discover the decision three sessions later when you’re trying to build something that contradicts it. The system is internally consistent. It’s just wrong.

The fix is a mandatory clarification checkpoint baked into your task-start workflow. Before any implementation begins:

Before writing any code, list every assumption you are making about this requirement. Flag any ambiguity in the acceptance criteria. Do not proceed until ambiguities are resolved.

This surfaces uncertainty as conversation. It costs two minutes. It prevents hours of refactoring from a silent architectural choice that turned out to be the wrong one.

Scaling Past Twenty Features: The specs/ Directory

A flat PRD becomes unwieldy once the feature set is substantial. When that happens, move to a structured directory:

specs/
  PRD.md               ← top-level goals, non-goals, constraints
  features/
    auth.md
    billing.md
    notifications.md
  decisions/
    ADR-001-db-choice.md
    ADR-002-auth-pattern.md
  PLAN.md
  MEMORY.md

Each feature spec is self-contained: background, requirements, acceptance criteria, dependencies, status. A model agent can be scoped to a single feature spec without loading the entire product context. For large codebases, this is the difference between a focused, effective session and a context-overloaded one.

The decisions/ folder deserves special attention. Architecture Decision Records — ADRs — capture why a decision was made, not just what was decided. When a new session encounters an existing pattern and considers "improving" it, the ADR is the artifact that explains the constraint that drove the original choice. This prevents an entire category of regressions: the kind where the code is technically better and functionally breaks something the original design was solving for.

Claude Code can set all this up for you, just ask it.

Context Hygiene: When to Clear and When to Carry Forward

Long sessions accumulate noise. A model that has been working through a complex feature for two hours is operating on a context window full of intermediate reasoning, discarded approaches, and conversational scaffolding that no longer reflects the current state of the work. That noise degrades output quality — subtly at first, then noticeably.

The discipline is knowing when to compact and when to start clean.

Use /compact within a session when the conversation has gotten long but you're still in the same logical work unit. This compresses the session history into a summary of key decisions while preserving the thread. Follow it with a MEMORY.md update to capture anything that needs to survive beyond the session.

Start a completely fresh session when you’re crossing a feature boundary. This is the most important context hygiene practice and the most commonly skipped. When you finish one feature and start another, the accumulated context from the first actively works against you — it biases the model toward approaches that made sense in the previous context, and it eats context window that should be available for the new work.

The transition protocol: before ending the old session, update PLAN.md and MEMORY.md to capture exactly what was completed and what comes next. Then open a new session, run your standard session-open prompt, and let the documentation layer carry the handoff. Don’t try to bridge features by keeping the conversation going.

Start a fresh session when something breaks in an unexpected way. Debugging in a long, noisy context is one of the worst ways to use an AI model. The model’s prior context includes all the decisions that led to the broken state — which means it’s primed to defend them. A fresh session with a clean reproduction of the problem and the relevant specs loaded often finds the issue faster.

The mental model: context is working memory. It’s valuable but finite. Manage it like memory, not like a log.

Enterprise AI transformation fails in two ways. The first is visible: the initiative launches, generates buzz, produces nothing defensible, and gets quietly defunded. The second is subtle: the initiative ships working software that gradually becomes incoherent because no one built the memory infrastructure to keep it oriented.

The second failure is more dangerous because it takes longer to notice. By the time the incoherence is obvious, there’s months of technical debt tangled with months of product decisions, and untangling them costs more than starting over.

The teams that avoid this aren’t ignoring model quality — it matters, and better models do handle ambiguity and context more reliably. But model capability has a ceiling on what it can do with bad context. The documentation layer is what raises the floor. A capable model operating on clean, structured, current context dramatically outperforms the same model working from a stale README and a vague prompt. Process and capability compound. Neither substitutes for the other.

The model is the hands. The docs are the brain.