Meet Clementine
Announcing Project Clementine — an open-source, LLM-powered web-app pentesting orchestrator that correlates application findings with AWS misconfigurations through a knowledge graph. Named after a very small bird with very large opinions.
Announcing a new LLM-powered pentesting orchestrator, named after a very small bird with very large opinions.
My cockatiel Clementine weighs about ninety grams. He is, by any reasonable metric, the smallest person in the house. This has never once occurred to him.
If you leave a pen on the desk, Clementine will investigate the pen. If the pen is uncooperative, he will try the cap. If the cap also resists, he will reassess from a different angle — sidestep along the edge of the laptop, crane his neck, nibble the corner of a sticky note he hadn’t previously considered, and then return to the pen with what can only be described as a theory. A cockatiel doesn’t politely ask whether a thing is chewable. He finds out. And the findings-out chain together: this led to this led to this, and now he’s on top of the monitor holding a paperclip and looking pleased with himself.
What I find most interesting about watching him, though, is that he isn’t really doing depth-first search on a flat list of objects. He’s building a map. This perch connects to that shelf connects to the top of the picture frame. The route from the cage to the forbidden cable runs through three legal surfaces and one slightly illegal one. He doesn’t explore the room — he navigates it.
I’ve been building a tool that works the same way. It’s called Project Clementine, and I’m announcing it today.
What Clementine is
Project Clementine is an automated web-application pentesting orchestrator. It coordinates six security MCP servers to assess both application-layer vulnerabilities (the full OWASP WSTG suite) and AWS infrastructure misconfigurations — and then it does the thing I actually care about, which is correlate them.
A medium-severity SSRF is, on its own, a medium-severity SSRF. IMDSv1 being enabled is a checkbox on an infra report. An overprivileged IAM role is a line item your cloud team has been meaning to get to. But:
SSRF (medium) + IMDSv1 enabled + overprivileged IAM role = full account takeover (critical)
Neither your app scanner nor your cloud scanner will catch this, because neither of them is looking at the other’s evidence. The chain lives in the seam between tools. That seam is where Clementine hunts.
The pipeline runs in five phases — recon, AWS audit, app test, AI triage, correlation, reporting — and ships findings in HTML, JSON, SARIF, and Markdown. It ships with forty-six built-in compound attack patterns covering injection, authentication, authorization, AWS infrastructure, privilege escalation, supply chain, and client-side attack classes. You can drop your own pattern into the patterns/ directory as YAML and it gets picked up on next run — no code changes required.
The knowledge graph
Here’s the part I’m most interested in right now.
During the AWS audit phase, Clementine builds a directed graph of your cloud environment. The nodes are the obvious things — IAM users and roles, EC2 instances, EKS pods, Lambda functions, S3 buckets, RDS instances, secrets, security groups, VPC endpoints, the IMDS endpoint at 169.254.169.254, and the web endpoints the application scanner found. The edges are the relationships — who can assume whom, who can pass which role, what’s routable to what, what has permission to read what, which pod is bound to which IRSA service account.
Then — and this is the part that matters — when the correlation engine runs, it bridges findings from the application layer directly into the graph. An SSRF finding on a web endpoint becomes a SSRF_REACHABLE edge from that endpoint into the IMDS node. And suddenly the question the correlation engine is asking isn’t “do I have a rule that matches these three findings” anymore. It’s “is there a path — up to four hops — from this vulnerable web endpoint, through IMDS, through a role, to something I shouldn’t be able to touch.”
That’s a different question. The old correlation engine could tell you “SSRF + IMDSv1 + overprivileged role = takeover” because it had a rule that said so. The graph engine can tell you “SSRF on /upload → IMDS → ecs-task-role → iam:PassRole on lambda-admin-role → Lambda code execution → S3 bucket full of customer data,” because the path exists in the graph, not because somebody wrote that specific chain down in advance.
The HTML report now includes an interactive Attack Graph tab — a force-directed canvas where IAM principals are purple, compute is blue, storage is green, Lambda is amber, web endpoints and IMDS are red, and exploit paths show up as dashed red edges. You can drag to pan, and nodes are colour-bordered by the severity of their linked findings. It is, I will admit, extremely fun to look at.
Why a cockatiel
I want to be precise about the metaphor, because I think the shape of the bird actually matters.
Clementine (the cockatiel) is not a raptor. He is not patient. He is not built for the long stoop from a high thermal onto a distant rabbit. He is also not a corvid, running long-horizon plans and remembering where he cached things last Tuesday. He’s something else: a small, curious, socially-motivated animal who finds the interesting path through an environment by trying things, noticing what gives, and chaining the givings-way together into something larger than any single nudge.
This is, I think, the honest shape of good pentesting. The romance of the field is the elegant zero-day — the raptor’s stoop — but the actual craft, most days, is the cockatiel’s walk across the desk. You poke the login flow. The login flow is boring. You poke the password reset. The password reset leaks a username. You poke the user-enumeration endpoint you just found. It responds differently to valid and invalid users. You combine that with the rate limit you noticed wasn’t enforced two endpoints over. And now you have something.
Nothing in that chain is a zero-day. Every individual finding is “medium” at worst. But the walk across the desk, the chaining of small curiosities, is the exploit. That’s what Clementine-the-tool is trying to automate — not the flash of brilliance, but the patient-but-feisty walk, and specifically the part where the walk crosses the seam between the application layer and the cloud layer.
And this is where the metaphor actually started doing more work than I expected. When I was correlating through rules, the bird was chaining curiosities. Now that I’m correlating through a graph, the bird is doing what he actually does in the house — navigating. The four-hop traversal from web endpoint to S3 bucket is the cockatiel walking from the back of the couch to the top of the bookshelf, and every perch on the way is a surface he tested to see if it would hold.
There’s a second thing the cockatiel metaphor does for me, and it’s about scope. A cockatiel is small. He is emphatically not every bird. Clementine (the tool) is not trying to replace Burp Suite, or your red team, or the human pentester who will do the creative work the tool can’t. It’s trying to do the specific thing a curious small bird does well: cover the whole environment with attention, notice what’s out of place, and find paths between rooms.
What it means that it’s LLM-powered
I want to be careful here, because “LLM-powered security tool” is a phrase currently doing a lot of work for a lot of products that should probably be doing more work for themselves.
In Clementine, the LLM has two specific jobs.
The first is triage. After the app-test and infra-audit phases dump several hundred findings into the database, Claude scores each one for confidence and flags likely false positives with a written rationale. This is the boring application of LLMs to security: reading context, weighing signal against noise, explaining the reasoning. It’s also the one that saves the most human time. If you’ve ever stared down a 400-line ZAP report at 4pm on a Friday, you know exactly what I mean.
The second is novel-chain discovery. The pattern library handles the known chains, and the graph engine handles the structural ones — but there’s a third category of chain that’s neither “matches a rule I wrote” nor “is a path through the graph I built,” which is “three findings that a thoughtful human would look at together and say, wait a second.” The LLM layer looks at the full corpus of findings for a given assessment and proposes chains in that third category, with written reasoning about why they matter or don’t.
This is the part of the pipeline I’m least done thinking about. Observability is still weaker than I’d like — when the LLM proposes a chain, I want to be able to see exactly why, and the tracing isn’t where it needs to be yet. Cost per scan is higher than it should be. Speed likewise. These are all on the list.
If you don’t set ANTHROPIC_API_KEY, both layers are skipped and Clementine still runs — pattern-based correlation, graph traversal, full reports, the whole pipeline. The LLM is load-bearing for a specific kind of finding, not for the tool existing.
What I’m asking for
This post is a companion to the README, but it’s also, quietly, a request.
Clementine is out there. The codebase is real. The forty-six patterns are real. The knowledge graph is real and sometimes works beautifully and sometimes doesn’t. The pipeline runs end-to-end against real targets and produces real reports that I’ve used on real engagements. What I want now is the thing a small, curious bird actually needs, which is more of the house to explore — and some help figuring out which surfaces will hold.
- More patterns. If you’ve seen a compound chain in the wild that isn’t in the library, I want the YAML. The format is dead simple — entry finding, pivot relationship, impact, remediation priority — and every pattern someone else contributes is a pattern I don’t have to write myself. The privilege escalation category is especially thin right now.
- Graph feedback. I’m genuinely uncertain how well the four-hop traversal performs against environments I didn’t build. If you run Clementine against a real AWS account and the Attack Graph finds something the patterns missed — or surfaces a path that looks interesting but is actually a false lead — I want the details. This is where the tool is most likely to improve fastest with outside eyes on it.
- MCP server integrations. The six current servers cover OWASP WSTG, AWS config, compliance frameworks, and DOM-level validation. Azure, GCP, Kubernetes-specific, IaC (Terraform/CDK/CloudFormation) scanning, and GraphQL tooling are all on the roadmap. If you maintain a security MCP server that would slot in, I want to talk.
- False-positive reports. The triage layer gets better when it sees the things it got wrong. If you run Clementine and it flags something that isn’t real, I want to know.
- Honest pushback. I’m especially interested in hearing from people who think the correlation-across-seams framing is wrong, or that the graph is an over-engineered solution to what the pattern library already did, or that forty-six patterns is either too many or too few. The cockatiel walks better on a desk where somebody occasionally moves the furniture.
You can find the repo, the docs, and the getting-started guide at the README. Issues and PRs are both welcome; a thoughtful issue is often more valuable to me than a PR, so don’t feel you need to arrive with code.
One last thing about the bird
Clementine — the real one, the one currently sidestepping along the edge of my keyboard as I write this — has no idea any of this is named after him. He has, in his ninety grams of opinion, more important things to attend to. There is a cable he has not yet fully investigated. There is a corner of this document he would like to nibble. There is, as there always is, a next thing.
I’d like the tool to have some of that. Not the flash, but the next thing energy — the sense that every finding is a small nudge toward another finding, that the environment always has more seams in it than you’ve checked, that the walk across the desk is never actually finished.
Come walk with me.
Project Clementine is open source. If you want to follow along as it develops, the best places are dylanshroll.com for the long-form write-ups and the repo for the code itself. If you want to talk, my DMs on LinkedIn are open.