When Code Review Isn’t About Trusting the Author

When people first started using AI to write code, the first problem they ran into was trust. Traditionally, code review was a kind of safety net. You’d review code to catch bugs, keep things consistent, and transfer some knowledge. But that was when you knew who wrote the code. You could imagine what they were thinking and why they did things a certain way. With AI, you lose that. Now you’re reviewing code that may be correct by accident, or broken in ways that are almost invisible.

Code review evolved for human mistakes—typos, off-by-one errors, logic that doesn’t quite add up. The reviewer and the author shared a mental model. You could ask questions, spot patterns, even argue about style. But AI doesn’t have a mental model. It just stitches together plausible code from millions of examples, sometimes blending styles or introducing assumptions that make sense to no one in particular. You might get a function that looks fine but is insecure, because the AI doesn’t really understand what “secure” means. Or you get code that’s technically correct but fragile, because it copied patterns from the wrong context.

That means reviewing AI code is sometimes harder, not easier. The code might look cleaner, but you can’t trust it the way you’d trust a good human engineer. You have to assume the AI doesn’t know what it’s doing, and check every assumption for yourself. You end up doing archaeology instead of collaboration—digging for the intent behind the code, even though there wasn’t any.

Some teams try to patch this by making new checklists. Did the AI sanitize inputs? Did it handle edge cases? Did it respect privacy rules? Others add more automation: static analyzers, fuzzers, anything to catch what humans and AIs both miss. The process becomes layered: the AI writes, tools check, humans review. But this creates another problem. If the AI does most of the work and the tools catch most of the mistakes, is the human reviewer still necessary, or are they just the last box to tick?

The biggest shift is this: with human code, you assume competence unless you see a reason not to. With AI code, you assume the opposite. You start from skepticism. That’s tiring, but it might make you more careful, because you’re forced to justify every step. Over time, the skill of reviewing code will change. You’ll need to know how AIs fail, not just how code fails.

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Andrej Karpathy, 2/2/25, @karpathy

Some people say this is temporary—that AI will get so good these problems will disappear. Maybe. But even if that happens, the real question remains: what’s the reviewer’s job? If the AI writes almost perfect code, the human’s job shifts to deciding whether the outputs fit bigger goals—security, ethics, business needs. At that point, reviewing code might mean reviewing the AI’s training data and objectives, not just its code.

Right now, reviewing AI code feels like using a familiar tool in a world that’s slightly off. The process is the same, but the risks are different and the old instincts don’t always work. What’s really changing isn’t just how we check code, but how we think about authorship and responsibility, now that the author is something that doesn’t think at all. That shift might change software engineering more than any single tool or language ever has.

Reply

or to participate.