Part 1 of an ongoing series on where AI lets me down.
I was having a chat with a friend today (Hi Harvey! We’re on the internets!) about how loud he’d have to yell, and how much energy and heat that would take, for him to yell “HI TP” from Queensland and be heard over in Ashburton, New Zealand. I wanted to make a joke about him burning himself to cinders in the attempt, but I didn’t know the physics well enough to do the math.
My first instinct was to ask Claude. Then I realised: if Claude forgot to factor in Doctor Physic’s Nineteenth Law of Thermodynamics and threw the answer off by a billion degrees (celsius, because I’m not a monster), I’d have literally no way of knowing.
And that’s the kicker. AI gets things wrong, sure, everyone knows that. But the problem isn’t the wrong answer, it’s that catching the wrong answer requires you to already know enough to spot it. If I don’t know thermodynamics, Claude can lie to me about thermodynamics all day and I’ll thank it for the answer.
In low-stakes situations like working out how big of a crater Harvey would make, this is fine. The trouble is when AI starts answering questions that actually matter, and the person asking doesn’t have the foundation to know when the answer is wrong. I’m seeing this in my job. Analysts are asking, for example, Charlotte AI (built into Crowdstrike) what this PowerShell snippet does:
|
|
Charlotte confidently answered:
is decoded to
echo 'GoRMqxWj';. The-InputFormat Noneparameter specifies that no input format is expected, which is useful for commands that do not require input.
If you’ve spent any time looking at PowerShell attacks, the shape of this snippet is immediately familiar: it’s a classic obfuscation chain, and it’s probably dropping an implant in memory.1 What Charlotte did was find the innermost string at the end of the chain (the harmless echo) and report that as the answer, ignoring the four layers of Invoke-Expression-style wrapping around it. Worse, it did so without any hint of uncertainty.
By design, AI is trained to sound 100% confident in its output, that’s how the system works. But come on. In this case, the analyst took Charlotte’s answer at face value and closed the alert as a false positive. We were lucky: this particular PowerShell was part of a red team exercise, so we caught it on review. If it had been a real attacker, we’d have had a foothold sitting unaddressed in the environment because the analyst who caught the case that day didn’t have the foundational PowerShell knowledge to look at the base64 and feel their stomach drop.
This isn’t new. Lawyers are being sanctioned for submitting briefs to court with hallucinated precedents. Big-4 consulting firms are refunding hundreds of thousands of dollars for reports padded with hallucinated references. Academia is rife with the same. People making mistakes that can just as easily be made by humans, except the AI’s volume and confidence make those mistakes faster and harder to catch.
For me, in cyber security, the stakes are the same shape. We miss a system during a compromise, miss data leaving the environment, close a true positive as a false positive. These are resume-generating events. The thing that protects us is the analyst who knows what bad PowerShell looks like, the responder who knows what a normal day on the network feels like, the engineer who knows what their detection rules can and can’t see. Foundational knowledge. Pattern recognition built up over years of being wrong, being corrected, and trying again.
And here’s what worries me. The narrative right now is “lay off your junior developers or SOC analysts, you don’t need them, AI can do the work for them”. But the work juniors do isn’t just output, it’s how they build the foundational knowledge that lets them catch AI being wrong five years from now. If we replace that supervised, on-the-job, getting-things-wrong-and-being-corrected practice with an AI that confidently produces the answer, we don’t get the same seniors faster. We get no seniors at all. The people protecting the business from AI, the ones who can look at a base64 string and know to escalate, just don’t exist anymore. I don’t know how we fix this, and it scares me.
For those of you who are curious: Harvey didn’t spontaneously combust. He decided not to call out in the end. It was late, and it might have been inconsiderate to the neighbours.
-
Specifically: UTF-16 base64 wrapping a GZip-compressed payload, which decompresses to more PowerShell, which decodes a base64 string, which finally executes. ↩︎