Joost Grunwald - Beyond the Prompt: Architecting Agentic LLMs for Autonomous Security Testing

Abstract

Can large language models move beyond theory and conversation into the domain of autonomous penetration testing? In this talk, we present the results of a series of cutting-edge experiments aimed at evaluating and enhancing the real-world offensive capabilities of LLMs through agentic architectures.

We explore whether automating reconnaissance improves LLM performance, how model alloys inspired by XBOW research can outperform individual SOTA models, and why in many cases—less is more. Our work integrates practical tools like Burp Suite and custom MCP pipelines, testing multi-agent groupchat architectures with specialized roles—planner, recon, exploit—and ultimately evolving into swarm-based methodologies that unlock collaborative intelligence.

Despite advances, we found that most high-performing tool calls still default to simple Python or curl, raising key questions about true integration maturity. Our architecture demonstrates how a well-orchestrated LLM-agent system can rival traditional pentesting frameworks in specific tasks, pass certain benchmarks, and uncover real-world vulnerabilities.

This talk shares technical insights, lessons learned, and the future potential for AI-augmented security testing.

Biography

After studying at the Radboud University and doing internships at SURF I started my own company building GRC and cybersecurity software.

Beside that me and a team of students are building an open source vulnerability management stack cofunded by SIDN and NLNET, where the first big chunks are done and open sourced.

I like cyber, programming, have an interesting in AI and did a minor on that subject and in my free time like to watch cycling and cook food.

Spreker

Foto van Joost Grunwald
Joost Grunwald

Tags