I've been using Gemini CLI, Claude Code and similar agents a lot lately. For tasks such as downloading a video I found on social media, so instead of googling a tool - I simply fire up one of these coding agents and let it figure out how to use yt-dlp.
Another example is bypassing the password protection of a pdf - a bank had mailed me a pdf saying the password is your customer id 3XXXX721 and for the life of me I couldn't remember or find the customer id anywhere. So, instead of using an online service and upload a potentially sensitive document to the internet, I asked Claude Code to brute force the password since it was only 4 unknown digits. It wrote a python code which did the job locally on my mac.
How I brute forced the password of a pdf
From PDFs to API Hacking
After seeing it crack a PDF in seconds for $0.27, being a security researcher, my next question was obvious: Can I teach it to hack APIs? Since all these tools can run curl or grep to help with the tedious parts of API auditing. Along with the ability to execute code (via a sandbox preferably), it should be able to the job.
Naive Approach
Though these LLMs have some capability to do security checks - I thought it will surely benefit from some additional real world examples of security issues right here in its context. So, I put the description and reproducibility steps of security bugs I had found in my career right here in their system instructions (via files such as GEMINI.md or CLAUDE.md)
A sample description of a security vulnerability looks like this -
1. A few lines of summary of the findings (5-10 lines)
2. Detailed `STEPS TO REPRODUCE THE BUG` (20-100 lines)
3. Finally impact/suggestion on how to fix it. (5-10 lines)
This makes the system instructions a little too big for these LLMs' comfort with the description of around 150 security issues and they start to complain. Not only this is an issue for the latency, it causes every small Hello message to also carry the whole context, increasing the cost unnecessarily.
System instructions become too big with all the descriptions
Along with the above I needed to ask the Gemini CLI (or Claude Code) tool to run mitmdump in the background on port 8088 and proxied my Chrome traffic through it. Since mitmdump was started by the CLI tool, it could access the logs from its stdout.
mitmdump -p 8088 --set flow_detail=3A Better Approach
Well, a quick and massive upgrade was to not log everything from
mitmdumpto the stdout - that would fill up the context quickly. So, I logged everything in alog.txtfile and asked CLI tool to useRegex,Grepetc. in the file for things it is looking for. This way CLI tool only sees the lines it is looking for which also saves cost.I also decided to convert "starting mitmdump and directing its log to a file" into a
command(a feature available in both Gemini CLI and Claude Code) -/start-mitmso that I don't have to write a fullplease start mitmdump on 8088 and dont log anything on stdout and rather log it in log.txt
~/Work/mitmclaude$ cat .claude/commands/start-mitm.md
# Start mitmproxy capture
Start mitmproxy to capture HTTP/HTTPS traffic with full request and response bodies.
## Instructions
Run the following command to start mitmdump in the background:
mitmdump -w traffic.mitm --set flow_detail=3 2>&1 | tee log.txt &The huge
system instructionscarrying each bug detail was the other issue. Borrowing an idea from command of/start-mitmabove, I thought why not put descriptions and how to find that one category of bugs into one command, say/check-for-idorwill refer to markdown fileCHECK-FOR-IDOR.mdwhich has only IDOR related vulnerabilities details.And lets not dump all the descriptions of the bug report as it is - the direct dump of them from hf - but rather only the learnings from them and how to look for them. I used one of the top LLMs to make this conversion from
security reportstoHow to look for these issues.
From 150 security reports to 4000
With the architecture finalized it was time to improve the scope of the tool. My own security bugs were a good starting point but I wanted a world class security researcher. So I went to the HackerOne's public disclosures. There is one dump available at hugging face https://huggingface.co/datasets/Hacker0x01/hackerone_disclosed_reports?viewer_api=true
There are roughly 10000 bugs in there and I filtered out 4000 from them which had some bounty payment, a proxy for a good impact bug, I assumed.
# Logic: Only train on bugs that paid out
df = df[
(df['bounty_amount'] > 0) &
(df['vulnerability_type'].isin(['IDOR', 'SSRF', 'RCE']))
]
⎿ === REPORTS WITH BOUN===
Total: 5383 out of 10094 (53.3%) ⏺ 4,186 high-value bounty reports filtered.With the data from 4000 security issues, all the command's markdown files were modified to include the new learning and new ways to find the issues, which in theory should make each command more robust.
A bug found where /avatar?u=USERNAME endpoint was enumerable
Putting it to action, I already found a low hanging fruit in vercel.com's /avatar?u=USERNAME endpoint. The tool told me that they are using just USERNAME as the param. And for me it was my USERNAME@domain.com from my email address. So, I asked it to try vercel CEO's username from twitter which I thought would be his username here as well. It worked, confirming the bug. If it hadn't worked I would have asked it to try some other famous usernames.
Transition From Researcher Controlled Tools To An Agentic Tool
While letting the researcher control what to look for with commands like /find-idor, the goal of an "agentic" behaviour was still not achieved - I wanted the LLM to figure out what are the available commands and invoke them on their own - you can achieve this in three ways,
- create MCP endpoints which search these "commands" as an endpoint individually. So, when MCP calls
/list-toolsit will see those commands as the individual tools. - or convert them into Skills if you are using Claude Code
- or convert them into Skills and use coderunner to help bridge the gap for Skill->MCP conversion for you. Now you can use any tool you like Gemini CLI, Qwen CLI, OpenAI Codex etc.
The content of a Skill is exactly same as the command. The same markdown file can be just used as a Skill by putting it appropriate folder.
A sample skill of finding IDOR
# ~/Work/mitmclaude$ cat .claude/skills/mitmfindidor.md
---
description: Find IDOR (Insecure Direct Object Reference) vulnerabilities in captured traffic. Use when user asks about authorization issues, sequential IDs, or accessing other users' data.
---
# Find IDOR Vulnerabilities
Analyze the mitmproxy dump (log.txt) for IDOR vulnerabilities for: $ARGUMENTS
## High-Value IDOR Patterns (from 132 real HackerOne bounty reports)
### 1. User/Account Object References
```
user_id, userId, user-id, uid, account_id, accountId
customer_id, customerId, member_id, memberId
profile_id, owner_id, creator_id, author_id
```
**Real example**: `https://zomato.com/gold/payment-success?subscription_id=XXX&user_id=YYY`
### 2. Resource Object References
```
order_id, orderId, booking_id, bookingId, reservation_id
transaction_id, txn_id, payment_id, invoice_id
document_id, doc_id, file_id, attachment_id
report_id, ticket_id, case_id, issue_id
```
**Real example**: `/api/shopify/orders/{order_id}` - change order_id to access other orders
### 3. Organizational Object References
```
project_id, projectId, team_id, teamId, group_id, groupId
workspace_id, org_id, organization_id, company_id
board_id, channel_id, room_id, space_id
```
**Real example**: `PUT /boards/{board_id}.json` - GitLab private project label access
### 4. Content Object References
```
media_code, media_id, image_id, video_id, asset_id
post_id, postId, comment_id, message_id, thread_id
article_id, content_id, item_id, entry_id
```
**Real example**: `media_code=2013124` - sequential IDs expose other users' media
### 5. Session/Token References (High Impact)
```
session_id, sessionId, subscription_id, subscriptionId
card_id, cardId, fuel_card_id, membership_id
api_key_id, token_id, credential_id
```
**Real example**: `activateFuelCard?id=XXX` - Uber driver UUID enumeration
## ID Encoding Patterns to Decode
| Pattern | Example | Decode Method |
|---------|---------|---------------|
| Base64 numeric | `MTIzNDU2` | `echo MTIzNDU2 \| base64 -d` → 123456 |
| Hex | `0x1E240` | Convert to decimal → 123456 |
| UUID v1 | Contains timestamp | Extract timestamp component |
| Short hash | `a1b2c3` | May be truncated MD5 of sequential |
| Padded | `000012345` | Strip padding, increment |
## Where to Find IDORs
### URL Path Parameters (Most Common)
```
/api/v1/users/{id}/profile
/api/v1/orders/{id}/details
/api/v1/documents/{id}/download
/campaign-manager-api/accounts/{id}
```
### Query Parameters
```
?user_id=12345&action=view
?subscription_id=XXX&user_id=YYY
?media_code=2013124
```
### Request Body (JSON/Form)
```json
{"user_id": 12345, "action": "delete"}
{"board": {"id": 857058, "labels": [{"id": 123}]}}
```
### Headers (Rare but High Impact)
```
X-User-Id: 12345
X-Account-Id: 67890
```
## Severity Rating
| Access Type | Severity | Example |
|-------------|----------|---------|
| Read other users' PII | **CRITICAL** | View email, phone, address |
| Modify other users' data | **HIGH** | Edit profile, delete content |
| Access other users' orders/transactions | **HIGH** | View order history, payment info |
| Read other users' private content | **MEDIUM** | View private posts, documents |
| Enumerate user existence | **LOW** | Confirm if user_id exists |
| Access public-ish data | **INFO** | View subscription dates |
## Testing Methodology
### Step 1: Identify Candidate Parameters
Search for ID patterns in traffic:
```bash
grep -iE '(user|account|order|session|subscription|member|card|document|file|project|team|group)[-_]?id' log.txt
```
### Step 2: Check for Sequential/Predictable IDs
```bash
# Extract numeric IDs and check if sequential
grep -oE 'id[=:]["'\'']?[0-9]+' log.txt | sort -u
```
### Step 3: Test Authorization
```bash
# Test with ID ± 1
curl -H "Cookie: victim_session" "https://target.com/api/resource/12345"
curl -H "Cookie: victim_session" "https://target.com/api/resource/12344" # Another user's
```
### Step 4: Verify Impact
- Does response contain different user's data?
- Can you perform actions (edit/delete) on other user's resources?
- What sensitive fields are exposed?
## Output Format
For each finding report:
```
## IDOR Finding: [Brief Description]
**Endpoint**: `METHOD https://target.com/path`
**Parameter**: `param_name` in [path|query|body]
**ID Type**: [Sequential|Base64|UUID|Hash]
**Current Value**: `12345`
**Severity**: [CRITICAL|HIGH|MEDIUM|LOW]
**Evidence**:
[Show request/response snippets]
**Impact**:
- What data is exposed
- What actions can be performed
**Test Command**:
curl -X METHOD 'https://target.com/...' -H 'Cookie: ...'
**Remediation**:
- Implement proper authorization checks
- Use indirect references (mapping table)
- Validate user owns the resource
```
## False Positives to Ignore
- Analytics/tracking endpoints (write-only, no data returned)
- Public content IDs (movie IDs, product catalog)
- Resource IDs that return same data regardless of auth
- IDs that require valid session AND return 403 for wrong user~/Work/mitmclaude$ tree .claude/skills
.claude/skills
├── mitm-find-auth.md
├── mitm-find-bizlogic.md
├── mitm-find-callback.md
├── mitm-find-checksum.md
├── mitm-find-enumerable.md
├── mitm-find-idor.md
├── mitm-find-insecure.md
├── mitm-find-otp.md
├── mitm-find-pii.md
├── mitm-find-referer.md
├── mitm-find-secrets.md
├── mitm-find-sqli.md
├── mitm-find-ssrf.md
├── mitm-list-apis.md
├── mitm-report.md
├── mitm-security-audit.md
└── mitm-subdomains.md
1 directory, 17 filesAfter that we can just either let the CLI tools decide what Skill to use or ask it to use a particular set of skill(s) in plain English as following -
Find security issues in example.com
or
Check for `idor` and `auth` issues in example.com
Disclaimer: Before using this tool on any domain, make sure you have the permission to do so.
Resources
- coderunner - Universal code execution sandbox with MCP support for AI agents
- security-skills - Security testing skills derived from 4000+ HackerOne bounty reports
- HackerOne Disclosed Reports Dataset - 10,000+ public bug bounty reports
- mitmproxy - Interactive HTTPS proxy for intercepting traffic
- Gemini CLI - Google's AI-powered CLI tool
- Claude Code - Anthropic's agentic coding tool