If you’ve been following along, you might remember my previous writeup on CVE-2023-2868, where we crafted one of the first public proof of concepts for command injection in Barracuda’s Email Security Gateway.
Since then, we’ve taken the research much deeper - combining autonomous AI-driven analysis with traditional fuzzing and manual code review. The results? A handful of findings to (human) validate further. AI analysis is great at setting up test pipelines, triggering faults, and reconnaisance. Exploitability testing and development still requires significant human operator intervention.
Our target:
A quick enumeration revealed 6 exposed network services:
| Port | Service | Notes |
|---|---|---|
| 443 | nginx + FastCGI | Primary web interface - big target |
| 80 | Apache/nginx | HTTP redirect |
| 22 | sshd | ssh, locked down to support by default |
| 25 | artful_dice + bsmtpd | Custom SMTP daemons |
| 161/UDP | snmpd | Monitoring |
| 3306 | MySQL | Backend DB (localhost only, but accessible from web services) |
This 18MB Go binary is the largest in the firmware. Statically linked, stripped, and its exact purpose wasn’t immediately clear from static analysis. It handles SMTP traffic, so we threw AFL++ at it.
Current status: 7.7M+ executions at ~281 execs/sec. Zero crashes so far. Either the Go code is surprisingly robust, or we need to rethink our approach to trigger the interesting code paths. The investigation continues.
Now THIS is what we’re excited about. bsmtpd is the custom Barracuda SMTP daemon:
Debug symbols + custom code + complex protocol handling = high vulnerability probability.
Our fuzzing setup for bsmtpd:
bsmtpd_fuzz/
├── input/ # Attack-specific seed inputs
│ ├── attack_template_injection_sender.txt
│ ├── attack_template_injection_domain.txt
│ ├── attack_redos_subject.txt
│ ├── attack_archive_zipbomb.txt
│ ├── attack_archive_nested.txt
│ ├── attack_archive_traversal.txt
│ └── attack_archive_longname.txt
├── config/
│ ├── minimal.conf # Lightweight for fuzzing
│ └── maximum.conf # All 50+ modules enabled
└── intelligent_fuzzer.py # Custom harness
We’re running AFL++ in multiple configurations:
# Dumb mode for stripped Go binaries
afl-fuzz -n -i input/ -o output_dumb/ -m 8000 -- ./artful_dice @@
# Instrumented mode for debug binaries
afl-fuzz -i input/ -o output/ -- ./bsmtpd -c config/minimal.conf @@
For bsmtpd, we built a network harness using preeny for socket-to-stdin conversion. This lets AFL++ fuzz the live daemon without needing network I/O.
We’re not just throwing random bytes. Our seed corpus includes targeted attack payloads. These serve as basic examples, many more are used in practice:
Template Injection:
' .. os.execute('id') .. '@evil.com
Archive Extraction Attacks:
../../../etc/passwdReDoS (Regular Expression DoS):
(a+)+b
(a|a)*
(a*)*
These target select lua module regex compilation.
We built a dashboard to monitor fuzzing campaigns. It tracks:
Along with AFL, a custom grammar fuzzer tailored to bsmtpd was created as well. bsmtpd fuzzer generates 666 (nice!) test cases based on our attack templates and runs continuously.
Here’s where things get interesting. We’ve been using the RAPTOR Autonomous Offensive/Defensive Security Framework to accelerate our research.
RAPTOR conducts autonomous vulnerability scanning across the entire firmware. Point it at a directory, it analyzes everything - binaries, scripts, configs, web apps. Output comes in SARIF format for tool integration plus human-readable markdown.
Example scan results:
Web Directory Scan: 141.7 seconds
├── 263 files analyzed
├── 239 findings
│ ├── 212 XSS (unquoted attributes)
│ ├── 21 injection vectors
│ ├── 5 direct script injections
│ └── 1 protocol downgrade
Perl Modules Scan: 5.2 seconds
├── 46 modules analyzed
└── 17 host header injection findings
In under 3 minutes, RAPTOR identified 256+ vulnerabilities across 300+ files. The amount of findings is impressive but obviously has the usual false positives, as many of these results are from CodeQL and semgrep.
Errors and hallucination are still very common even with premier models. Human supervision is still required for a lot of tasks beyond basic scripting, automation, and test case generation. Determining exploitability and exploit development still requires a lot of creative reasoning that AI do not handle well.
AI is great at finding the puzzle pieces, but humans still need to assemble the damn thing.
High level workflow:
This combination has made us significantly more effective. Claude x RAPTOR handles the grunt work. We focus on the creative exploitation.
We’re continuing to fuzz bsmtpd and the other custom binaries. A few faults triggered by the fuzzer confirms our approach. The entire pipeline for this project was created within a few hours; faults triggered within a few days of fuzzing.
Stay tuned for the full technical report and additional PoCs if we confirm exploitability. All findings will be responsibly disclosed to the vendors.
Greetz to everyone pushing the boundaries of autonomous security research. The machines aren’t replacing us yet - but they’re making us faster.