David Monnerat

Product + AI | Systems Thinker | Enterprise Reality

Tag: code

  • The Code Nobody Understands: The Hidden Risk of AI-Generated Code

    The Code Nobody Understands: The Hidden Risk of AI-Generated Code

    The error message appeared at 2:17 a.m. Something about a null reference in a service that hadn’t been touched in weeks. The developer on call pulled up the relevant file, read through it, and felt a familiar but newly uncomfortable sensation.

    She had written this code. Her name was in the commit history. But she hadn’t really written it. She’d accepted it. Reviewed it in the way you review something when you’re moving fast, and the output looks right, and the tests pass. The AI had generated the logic. She had approved the shape of it. And now, at 2:17 a.m., she needed to understand not just what it did but why it did it that way: what assumption it was built on, what edge case it was avoiding, what the author had been thinking.

    There was no author. Not in the sense that mattered.


    Debugging is not reading code. That’s the thing people miss when they think about what AI-generated code changes.

    Reading code tells you what a program does. Debugging tells you why it broke, which requires understanding what the code assumed. Every piece of code is a record of decisions: why this approach and not another, what inputs were considered normal, where the edge of the design was drawn. Those decisions live in the mind of the person who made them. When you write code yourself, you carry that context invisibly. It’s not in the comments. It’s not in the variable names. It’s in your memory of the afternoon you spent figuring out why the simpler approach didn’t work.

    When something fails, you don’t just read the broken code. You reconstruct the intent. You ask: what was this trying to do, and where did reality diverge from the assumption? That reconstruction depends on having built the mental model in the first place.

    AI-generated code arrives without that history. It is the output of a process you did not participate in, produced by a system that has no memory of producing it and no stake in what happens when it runs. The code may be correct. It may even be elegant. But the reasoning that produced it is not available to you.

    When it breaks, you are not reconstructing intent. You are reverse-engineering a decision process that was never explained and no longer exists.

    This is happening one pull request at a time.

    A developer uses an AI tool to scaffold a new service. The output is good enough. She reviews it, adjusts a few things, and ships it. Six months later, either a different developer or the same one with a different memory has to modify that service. He reads through it. The code is coherent but opaque in the way that inherited code is always opaque: it works, and you can see that it works, but you cannot see why it was built this way and not another way. Usually, that’s because the original developer made tradeoffs they never documented. Now it’s because there was no original developer in the relevant sense.

    Neither of them is doing anything wrong. The tool is working as intended. The code is functional. The problem is quieter than that.

    The codebase is accumulating decisions that nobody made.

    There is a craft dimension to this that is easy to undervalue until it is missing.

    Experienced developers carry something that is hard to name but easy to recognize: a feel for systems. The ability to look at a piece of code and sense where it will be fragile. To read a stack trace and know immediately which layer to look at. To hear a description of a failure mode and think: I’ve seen that pattern before, it usually means this.

    That intuition is not innate. It’s built from years of writing code that broke, figuring out why, and carrying the lesson forward. It’s the accumulated residue of debugging. Every production incident is a deposit into that account. Every hour spent reconstructing intent from someone else’s code is a deposit.

    If you outsource the writing, you also reduce the debugging. If you reduce the debugging, you slow down the accumulation. The intuition that makes senior engineers valuable is built precisely from the friction that AI tools are designed to remove.

    The tools make you faster. They may also make you shallower. Not immediately, but incrementally.


    A single developer using AI to move faster is fine. The tradeoffs are local and manageable. But consider the codebase of a growing company where the majority of code was generated by AI tools over a period of several years. Where the team that shipped the original services has turned over. Where nobody alive in the organization has a mental model of why certain architectural decisions were made, because those decisions were not made by a person in the way that creates mental models.

    That system will fail eventually. All systems do. And when it fails in a way that matters, under load, in a corner case, in a security context, the organization will face a debugging problem that is categorically different from the ones it has faced before. Not harder in complexity, necessarily. Harder in a different way: the knowledge required to fix it was never created.

    You cannot interview the AI that wrote it. You cannot ask it what it was thinking. You cannot look at its previous projects and recognize a pattern. And then it’s gone.


    None of this is an argument against using AI coding tools. They are genuinely useful, and the productivity gains are real.

    This is an argument for being honest about the trade-off.

    We tend to talk about AI-generated code in terms of output: does it work, does it pass the tests, does it meet the spec. These are the right questions for shipping. They are incomplete questions for operating.

    Operating a system over time requires more than knowing that it works. It requires knowing how it works, and why, and where it will break, and what to do when it does. That knowledge has to live somewhere. In a person, in documentation detailed enough to reconstruct intent, or in a team with enough collective history to fill the gaps.

    Right now, we are generating code faster than we are generating that knowledge. The gap between the two is not visible in the metrics that matter day to day. It will become visible under pressure.


    The developer on call at 2:17 a.m. will figure it out. She always does. She will read the code carefully, form a hypothesis, test it, and find the assumption that broke. She is good at this.

    But she will spend longer than she should, because the code she is debugging does not remember being written.

    And the next time will take a little longer still.