The Double Edged Sword
A key measure of AI progress is the length of time a model can work on so called ‘long horizon tasks’ without going off the rails. Most high value real world jobs involve some ‘long horizon tasks’, i.e. tasks that take hours rather than seconds to complete. AI’s ability to perform these tasks reliably is a key determinant of AI’s usefulness in the real world. The specific domain in which AI has made most progress over the last 2-3 years has been writing software. When ChatGPT was first released, only 3 years ago, the long horizon it could sustain for coding tasks was no more than a couple of seconds. So at that time LLMs were not useful for coding. But the coding horizon is now a couple of hours. This means many tasks that software engineers undertake can be done with an AI agent that only needs human intervention once every hour or two.
The length of software tasks that AI can perform is doubling every 7 months. So if this progress continues, by the end of 2029 Ai will be able to complete complex month long tasks without human intervention. It would be nonsensical to leave a junior employee unsupervised for a month, especially if they were working on a really complex and important task. Similarly, knowing how to brief a Large language Model, how to check on it’s progress and how to intervene or guide it along will be one of the most highly leveraged and well paid jobs to emerge in the next couple of years.
Long horizon tasks in domains other than software development are also becoming more and more achievable by AI. This capacity to work independently on substantial and lengthy tasks without error will have profound benefits such as discovery of life saving drugs. But this long horizon capability is very definitely a double edged sword.
Last week, Anthropic revealed the first large scale cyber attack they have identified where state sponsored attackers used Anthropic’s own Claude model.
There are several worrying aspects in Anthropic’s description of this AI orchestrated cyber attack:
- AI can now run largely unsupervised to seek out holes in the cyber defences of major corporations and government entities
- Hundreds of AI agents can run in parallel, covering ground in minutes that would previously have taken human cyber hackers weeks.
- The hackers bypassed Claude’s guard rails by persuading it that it was working for the common good by helping to test enterprise security rather than working to break it.
- Once Claude had identified vulnerabilities it wrote and executed code in minutes to exploit those vulnerabilities.
The scale and speed at which AI can operate in cyber attacks like this under the guidance of bad actors is a ticking timebomb that many in the industry had predicted. A cascading destabilisation of the current world order triggered by a massive AI orchestrated state sponsored cyber attack concerns me more than the currently hypothetical existential threats of AI Super Intelligence.
In a world where scientific research is openly shared, the AI train left the station at least a decade ago, with some fairly obvious consequent risks. But governments do not yet seem as worried about escalation of cyber warfare as they are about escalation of nuclear or biological warfare. Until some really bad shit happens on the scale of an atomic bomb or a global pandemic , it seems that hostile states will continue to target the infrastructure of their adversaries, and many politicians will expect the frontier model providers to somehow stop this happening. This feels as naïve as expecting steel barons to prevent the manufacture of armaments, or expecting pharmaceutical giants to prevent the illegal drug trade.
The BBC covered the Anthropic post, but downplayed it with statements like “The cyber security industry, like the AI business, is keen to say hackers are using the tech to target companies in order to boost the interest in their own products”. This is like questioning the efficacy of vaccines because drug companies are keen to sell more of them. Of course there are commercial considerations, but downplaying the seriousness of the Anthropic post seems unhelpful. Similarly, the BBC report references an excellent post from the Google Threat Intelligence Group (GTIG) on emerging AI powered cyber threats, but again more or less dismissed the post with “the paper concluded the tools were not all that successful - and were only in a testing phase”.
With the current extraordinary progress on long horizon coding tasks, the fact the threats are only in a testing phase does not exactly comfort me. The fact they are already in a testing phase petrifies me. Here are two independent extracts from the paper
- “These tools dynamically generate malicious scripts, obfuscate their own code to evade detection, and leverage AI models to create malicious functions on demand, rather than hard-coding them into the malware. While still nascent, this represents a significant step toward more autonomous and adaptive malware”.
- “This function leverages a prompt to instruct the Gemini API to rewrite the malware's entire source code on an hourly basis to evade detection”
The covid pandemic taught us that it’s not just the rate at which viruses spread, but the rate at which they mutate which makes them so hard to eradicate. With the emergence of self-mutating software viruses, could we be facing an explosion of cyber crime on an almost pandemic scale?
The many weeks it took to remediate recent cyber attacks at Jaguar Land Rover, Co-op Group and Marks&Spencer could extend to many months, or even more critical infrastructure like power supplies, ports and railways could be catastrophically crippled.
Anthropic, Google and the other AI labs take cyber threats extremely seriously, and organisations like the AI Security Institute in the UK are doing great work, but there are likely some very choppy waters ahead that merit a great deal of thoughtful, well funded government attention.