Bad Language Models (Explicit).

A leaked Claude Code regex for detecting user swearing is a small, vivid example of how massive AI-built codebases accumulate flabby, brittle features — and a reminder that the software Claude writes is only ever as good as the spec you gave it.

Image of two tangled ropes, one representing a regular expression
audio-thumbnail
Audio Narration
0:00
/696.32

LLM hallucinations are frustrating, particularly when they are manifestly stupid, but they are unsurprising. An LLM is a bit like a ginormous roulette wheel that's been extremely well tampered with to almost always land on the 'right' next word, but it's never guaranteed. And once they've gone off piste, they can generate an entire fairy tale. In other words, unexpected (a.k.a. wrong) output is by default an inevitable consequence of the probabilistic nature of next token prediction in language models. In contrast, if deterministic 'regular' software like Windows, Gmail or the Post Office accounting system misbehaves, that's not bad luck, it's bad specification or bad testing. The granddaddy of all coding agents, Claude Code, is not a Large Language Model, not a Small Language Model and not even a Bad Language Model. It is itself just a regular software harness, wrapped around Claude, the mysterious language model it invokes on the user's behalf. Unfortunately, it is surprisingly difficult, but often critically important to specify unambiguously what you want regular software to actually do and actually not do. That difficulty applies equally to Claude Code as it does to the Post Office accounting system.

Claude Code was absolutely not vibe coded, but it was written extremely fast largely by Claude itself, under the guidance of the very talented Boris Cherny. So it's unsurprising that it is distinctly on the flabby side. It is over half a million lines of source code, which for context is several times larger than Shakespeare's entire corpus and also larger than Charles Dickens's absolutely enormous body of work. This matters because when a piece of software gets as big as Claude Code it's impossible for any single developer to keep the entire corpus in their head. There ought to be a Dunbar's Number for software — the equivalent of the 150 people in your own village who you could truly know. The combination of flab and breakneck development speed is a double whammy. Poorly specified features get buried out of sight or wilfully disregarded under the commercial pressure to ship function before your competitors. Here is one example of prima facie loose specification from Claude Code itself. I will describe it without using very technical language, but I will necessarily have to use very bad language, so be warned. To be clear, I have no insight into the provenance of this tiny piece of code or its purpose. I'm simply using it to explore the challenge of tightly specifying what you want code to do. Following a recent source code leak, the internet developer community very quickly found a pattern matching string, known as a regular expression, in Claude Code, which matches any of the following exact words or phrases in the user input:

  • wtf, what the fuck, wth, what the hell, ffs or omfg
  • shit, shitty, shittiest, piece of crap or piece of junk
  • dumbass, horrible or awful
  • fucking broken, fucking useless or fucking terrible (with or without a 'g' on the end of 'fucking')
  • pissed off or pissing off
  • fuck you, screw this or screw you
  • so frustrating, this sucks or damn it

This is very basic, and extremely partial, sentiment analysis, perhaps to measure whether a Claude update has backfired, or (less likely) for the agent to help the Claude LLM by pre-pending the user input with something like "HINT: assume things aren't going well and your customer satisfaction score is going down the pan, so best put on your smartest thinking cap for the next answer".

Regular expressions are powerful, but very dense sequences of characters wrapped in brackets, dots, dashes and other punctuation marks. Ironically, they are often so incomprehensible and unmaintainable as to elicit from a seasoned developer one or more of the very expletives Claude Code is looking for with the regular expression mentioned above. Sort of a bit meta.

Zawinski's Famous Aphorism Against Regular Expressions

The extreme difficulty humans have diagnosing misbehaving regular expressions led the computer programmer and early blogger Jamie Zawinski to quip 30 years ago: "Some people, when confronted with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems."

As an aside, Large Language Models are hugely better than humans at writing and reading regular expressions because to them an impenetrably dense sequence of punctuation marks is no more or less understandable than a nursery rhyme.

If anyone's interested, the regular expression in Claude Code that started me down this track is /\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful|piss(ed|ing)?\s+off|piece\s+of\s+(shit|crap|junk)|what\s+the\s+(fuck|hell)|fucking?\s+(broken|useless|terrible|awful|horrible)|fuck\s+you|screw\s+(this|you)|so\s+frustrating|this\s+sucks|damn\s+it\b/i
This is too simple to garner Zawinski's unalloyed scorn, but it is at face value rather an imperfect piece of code. It would, for example, not catch "For fuck's sake", "This code is absolute shite", or "This code is abso-fucking-lutely terrible". But on the other hand it would inappropriately catch "OMFG, this is totally amazing", or "This is one of the most shit hot algorithms I've ever seen".

A Small Lexical Digression

The English swearing vocabulary, probably the most lexically diverse in the world, contains more than a thousand expletives derived purely from the single root 'fuck'. I've no idea if ChatGPT is normally squeamish about swearing, so I introduced my research into this diversionary topic with "Role: You are an expert linguist with a deep understanding of the evolution of expletives, curses and other aspects of colloquial predominantly verbal language". I was rewarded with a conversation worthy of Grok Unhinged Mode, albeit interspersed with lots of serious linguistic details. If, like me, you are a connoisseur of colourful language, this fuck-strewn academic treatise extracted verbatim from that conversation with ChatGPT really is great value. I particularly like the concluding line of that treatise: A linguist could plausibly argue that modern colloquial English has evolved "fuck" into something approaching an emotional auxiliary system rather than merely a swear word.

Hats off to anyone who has got this far.
It's not that this pattern matcher is an important flaw in Claude Code, and in fact it may even be 100% fit for purpose as a general measure of relative audience satisfaction; if the overall amount of swearing goes up after a model upgrade (as indicated by the representative sample caught by this pattern), then probably best notify someone to take a closer look. But it seems unlikely that a thoughtful developer starting with a clear specification would have come up with this particular ragbag pattern. But this trivial example does illustrate the difficulty of converting a hand-waving specification like "log bad language indicative of frustration" into a robust capture of most expressions of dissatisfaction without any false positives. Just look at the huge variety and meaning of fuck usage in The Remarkable Linguistics of English Swearing.
More generally any software like Claude Code built at breakneck speed on brittle foundations will inevitably contain some inadequately specified features and lurking bugs. And sometimes that will really, really matter.

Moreover, this brittleness will be buried in millions of lines of flab that no-one has the time or will to seriously review, which is why many experienced software developers worry about the tsunami of AI generated code created over the past year or so. The global volume of AI coding activity is hard to comprehend. Github is the largest repository of code in the world, already used by 180 million developers and most major enterprises like BA, Microsoft and the NHS. But despite the platform's enormous scale, it is seriously buckling under the AI load. They recently had a massively disruptive outage impacting the entire software industry. And they are now redesigning the platform to support not 3 times, but 30 times more activity generated by AI agents than by human developers.

The Takeaway

Experience and expertise in clear specification really sets apart: - professional developers massively leveraging their own productivity with coding agents - unaware vibe coders generating an absolute morass of badly behaving AI slop. The 10,000 hour rule is just as applicable now as it ever was for really professional software development. But the good news is that you don't need a degree, an A Level or even a GCSE to get started today. The technical barrier to creating new applications for yourself or your own small organisation has almost completely vanished in the last 12 months. So do use Claude Code — it's abso-fucking-lutely brilliant. But just remember: the software it produces, and which you will probably never read, will only be as good as the specification you gave it in the first place. Before dispatching your 'make me something amazing' prompt with a satisfied smack on the Enter key, ask yourself what is missing. What is woolly or ambiguous, what should be explicitly stated as out of scope, what are the 'non-functional' requirements (which browsers should your shiny new website be tested against, what are the performance requirements, how will the software be maintained, what future extensions must be possible?). That is of course very far from an exhaustive list. And be very, very cautious about foisting unreviewed software on the world at large. Think carefully about the worst possible consequences of a buried bug or misinterpreted intent. The Post Office debacle is a salutary reminder of just how seriously people's lives can be ruined by massive software failures, which sometimes stem from tiny software bugs.