Psst. If your boss won’t invest in training you in Test-Driven Development, I’m running out-of-hours workshops on April 7 and 11 specifically for self-funding learners. £99 + UK VAT.

I see a lot of wrongheaded takes on “AI” coding assistants and agents online, but one of the most misguided goes something along the lines of “Code doesn’t need to be easy for humans to understand anymore”.

Presumably, these people have never stopped to ask themselves why they’re called “language models”, but let’s entertain their chain of thought for a moment.

The theory is that humans won’t need to understand the code generated by LLMs (wrong!) because they won’t need to edit the code themselves (wrong!), and so code should be optimised for things like token efficiency, not readability (WRONG!)

Some even go so far as to propose token-optimised programming languages designed by and especially for LLMs.

Working backwards through this sequence of compounding errors quickly unpicks the logic.

First of all, even if LLMs could design a general-purpose, Turing-complete programming language from scratch – which they can’t – it would require huge numbers of examples to train them to work with such a language. It might work for small DSLs, but languages like Java and Go are a whole other ballpark.

Without that large corpus of training data – showing examples of every aspect of the language many, many times – every request to generate code in it would be out-of-distribution. There’s a good reason why Claude, GPT, Gemini etc are better at generating code in popular languages. So, are we going to write those millions of examples? In a non-human-readable programming language?

Secondly, when you look at the tokens in code, how much of it is language syntax – reserved words, semicolons and so on – and how much of it is names that we have chosen for the things in our code to make it more self-descriptive?

“Ah, but Jason, AI-generated code doesn’t need to be self-descriptive, because we don’t need to understand it.”

I have bad news for you on that front, I’m afraid. Research – both my experiments and larger-scale studies – have found that the less clear code is to human readers, the less clear it is to LLMs. When code is obfuscated, models make more mistakes interpreting it.

This should come as no surprise. Language Models – the clue is in the name. What did folks think they were matching patterns in?

Human-readable code is model-optimized code.

LLMs don’t “understand” code like compilers do, they pattern-match on language.

So:

And, just as it’s incredibly helpful on a software team if everybody’s speaking the same shared language to describe the problem they’re setting out to solve – including (especially) in the code itself – it’s also essential that we use consistent shared language in our prompts and in the code. If I ask Claude to change the “sales tax” calculation logic, and in the code the function’s called ‘v_tx’, we’re probably not going to get a glowing result.

Confusion of Tongues: The Construction of the Tower of Babel, Lucas van Valckenborch, 1594

And then there’s the take that humans won’t need to edit the code. Some even go so far as to say that LLMs will be generating machine code directly, bypassing the human-readable source.

The fact of the matter is that LLMs are unreliable. Very unreliable. I’ve documented the common experience of agents like Claude Code getting stuck in “doom loops”, when a task or problem goes outside their training data distribution, and they just can’t do it. This is a fact of life when we’re working with the technology, and the general consensus – even among the hyperscalers – is that it’s an unfixable problem at any achievable scale.

There will always be a need for the human in the loop, and there will always be a need for that human to understand the code.

If it’s token efficiency you’re after, the smart thing to do is to focus on reducing the probability of mistakes and unnecessary rework. Deliberately obfuscating our code is going to work against us in that respect.