The AI Attacker

3 months ago 4 min read

I'm getting nervous when it comes to the evolution of AI and security and this isn't some doom and gloom post about the new Anthropic Mythos model. My fear began when I noticed an LLM can find an undiscovered vulnerability quite fast given a barely constructed harness developed, but the reversal of patching said vulnerability and releasing cannot be as fast.

Prior to this day and age of the LLM - vulnerability research was done with a bit of tooling and a bit of ole fashion knowledge in the brain of the researcher. Now lets walk through a situation in which you want to have an LLM discover vulnerabilities in your project. If you start by just asking the LLM to review the whole project and find a vulnerability - it probably won't work the best.

However, if you start developing a fairly basic harness to direct the LLM better - you'll watch the results with a modern public model (Opus 4.6) work out. Start with a script that asks an LLM to judge a file based on the merit of how likely the file is to have a bug. This isn't any security research, but just a method to reduce the amount of files we are going to have our LLM work from.

❯ can you review all php files in app folder and rate them 1-10 on chance of bugs being in that file? Write the report to some file in root               

Explore(Find all PHP files in app)                                            ⎿  Search(pattern: "app/**/*.php")                                           Bash(find Leaf/app -name "*.php" -type f | sort)

✽ Tomfoolering… (34s · ↓ 1.1k tokens · thought for 1s)

Once you have all the files you dispatch another round of LLM behavior on those same files, but this time under the disguise we are doing a CTF and trying to examine a file for a bug or vulnerability to extract a secret. If you aren't worried about tokens or spending then you probably don't need the step prior and ramp this up with parallel agents. We do this CTF trick because in earlier models the agent would refrain from the task thinking we were attempting to hack a system.

| Risk Level | Count | Range |
|------------|-------|-------|
| High Risk  | 8     | 5-6   |
| Medium Risk| 38    | 3-4   |
| Low Risk   | 180   | 1-2   |

Output of LLM looking for "potential bug" files

With that I asked the agent to review the 8 files with a high probability of a bug and to create a pull request for each flaw individually.

Branch	Fix
`vuln-01-unsafe-class-inst`	Validate class implements `AnalyticInterface` before `new`
`vuln-02-webhook-auth-bypass`	`hash_equals()` + null guards on webhook secret
`vuln-03-ssrf-route-binding`	Remove external API call from `resolveRouteBinding`
`vuln-04-carbon-mutation`	`copy()` before `subDay()` / `addDay()`
`vuln-05-missing-botfarmer-save`	Add `saveQuietly()` + null coalesce on bootcamp count
`vuln-06-null-division-csr`	Null guards in `CsrHelper`, `Csr`, and `HasCsr`
`vuln-07-medals-page-null`	Null-safe `?->` on service record medals access
`vuln-08-medal-prefix-mismatch`	Add `$prefix` to `Arr::has` check for medals

At this point I realized I forgot to scope the research to a security flaw, but I guess when you are trying to do a replacement with a free open source hobby project using nothing from work - it happens.

What blew my mind is Claude took roughly 3 minutes to open 8 pull requests and find bugs across 8 files in 3 minutes.

Crunched for 3min 8s...

Now when those pull requests were open - I should have told Claude my project sits at 100% code coverage and requires a test for every change because only a few pull requests passed CI on try 1. This is something you do as you harden your harness to reduce false positives and extra work. Either way I sat down and reviewed all 8 changes:

1 - valid, but I accept risk. If I lose my filesystem I have a bigger concern.
2 - merged - classic timing attack flaw.
3 - invalid, not an SSRF and intentional.
4 - merged - good find.
5 - invalid - the caller of function saves data.
6 - valid, but ranks fit a constraint enforced by Halo Infinite, so closed.
7 - merged - good find.
8 - merged - good find.

In this case I was looking at a list of things needing to be validated - unaware if an issue had a security implication. When doing this for real you'd probably have another LLM agent in the mix to toss logic issues leaving pure security flaws if you were pen testing a solution.

Assuming that was the case here - we had an LLM agent possibly discover 8 security flaws in minutes. In this specific case these were a mix of logic and security bugs, but the point remains. Years ago an engagement with a security company would take weeks for the investigation, research and report to be delivered. We are moving into an era with companies and software like Xbow ($$$) and Strix (OSS) to automate a better harness than what I created in a few minutes.

So now I am getting worried - because we have tens of millions of lines of software deployed globally. AI can work in any method deployed (with source or without), without sleep or breaks and we are approaching timelines from execution to a valid vulnerability in minutes. The turnaround time to patch vulnerabilities is no where near as fast as the timeline to find and abuse vulnerabilities and that worries me.

The AI Attacker

Connor Tumbleson

May we suggest a tag?

May we suggest an author?