First Results: AI Agent JA4 Fingerprint Discovery
Capture period: 3–4 April 2026 (31 hours) | Honeypot domains: counteragent.io, gptplugins.io, projectlanterna.com
Key Finding
ClaudeBot (Anthropic) and GPTBot (OpenAI) share the same JA4 TLS fingerprint:
Both crawlers use HTTP/2.0 and produce identical TLS ClientHello parameters — same cipher suites, same extensions, same ordering. This strongly suggests they share the same underlying HTTP client library. Despite being competitors, their crawlers are indistinguishable at the TLS layer.
OAI-SearchBot (ChatGPT Search) uses a different JA4 fingerprint, confirming it’s a separate client implementation within OpenAI.
AI Bot Fingerprint Catalogue
| Bot | Operator | Category | JA4s | HTTP | Key Insight |
|---|---|---|---|---|---|
| ClaudeBot | Anthropic | AI Crawler | 1 | H2 | Single consistent JA4 across all IPs |
| GPTBot | OpenAI | AI Crawler | 1 | H2 | Same JA4 as ClaudeBot — shared TLS library |
| OAI-SearchBot | OpenAI | AI Search | 2 | H2 | Different JA4 from GPTBot, separate client |
| CensysInspect | Censys/Google | Scanner | 8 | H1 | Cycles TLS 1.0–1.3 + QUIC |
| okhttp | Square | HTTP Library | 16 | H2 | 16 TLS variants, 22 distributed IPs |
| HeadlessChrome | Chromium | Automation | 9 | H1/H2 | Each version (138–145) has distinct JA4 |
| LeakIX l9scan | LeakIX | Vuln Scanner | 16 | H1 | Probes .env, trace.axd, GraphQL |
| curl | curl project | CLI Tool | 14 | H1/H2 | Shares some JA4s with l9scan |
| Python aiohttp | aiohttp | HTTP Library | 5 | H1 | Credential scanner traffic (.env probes) |
| Go-http-client | Go stdlib | HTTP Library | 4 | H1/H2 | Often paired with HeadlessChrome |
| InternetMeasurement | Academic | Research | 12 | H1 | 12 variants from 1 IP, tests every TLS config |
| MJ12bot | Majestic SEO | SEO Crawler | 2 | H1 | Reads robots.txt then .well-known |
| Python-urllib | Python stdlib | HTTP Library | 1 | H1 | Clean single fingerprint |
| Dalvik | Android | Runtime | 1 | H1 | ZTE device, TLS 1.2 only |
Scanner Behaviour Patterns
TLS Version Cycling
CensysInspect, LeakIX, okhttp, and InternetMeasurement produce multiple JA4 variants by connecting with different TLS versions (1.0, 1.1, 1.2, 1.3, QUIC). A single scanner can produce 8–16 distinct fingerprints from one IP.
UA Rotation Defeated by JA4
The most active novel fingerprint rotates between 3 different Chrome User-Agent strings across 10 IPs. The JA4 remains constant — proving JA4 fingerprinting sees through UA spoofing.
Credential Scanning
Python aiohttp traffic probes /.env, /.env.local, /.env.prod, /config/aws.yml — hunting for exposed credentials. JA4: t12d120700_d34a8e72043a_036209cd1ead
Active Exploitation
One novel fingerprint probed /@fs/etc/passwd and /.git/config — targeting Vite dev server path traversal (CVE-2025-30208). Spoofed 3 browser UAs from a single TLS fingerprint.
Traffic Summary
| Domain | Requests | Share | Strategy |
|---|---|---|---|
| gptplugins.io | ~3,800 | 49% | Graveyard reclaim (dead ChatGPT plugin directory) |
| projectlanterna.com | ~2,300 | 29% | Project homepage |
| counteragent.io | ~1,700 | 22% | Hallucination trap (LLM token compound) |
The graveyard domain attracted 2.2x more traffic than the hallucination trap, suggesting domains in LLM training data generate more organic agent traffic than novel domains.
Methodology
- Capture: tshark on EC2 (t4g.medium ARM64, ap-southeast-2), TCP+UDP port 443
- TLS termination: Caddy with Let’s Encrypt, direct on instance (no CDN/proxy)
- Extraction: FoxIO JA4+ suite processing hourly pcap rotations
- Correlation: Source IP matching between pcap and access logs
- Baseline: FoxIO ja4db (68,297 entries) for known/novel classification
- Privacy: All source IPs SHA-256 hashed
Honeypot Design
Two domains with distinct strategies:
- counteragent.io — Hallucination trap. Name engineered from high-frequency LLM tokens. Poses as AI agent detection platform.
- gptplugins.io — Graveyard reclaim. OpenAI deprecated ChatGPT Plugins in April 2024. This domain appears in LLM training data.
Lure endpoints: ai-plugin.json, agent.json, openid-configuration, OpenAPI spec, fake REST API, robots.txt, sitemap.xml. Client-side JS beacon collecting 40+ browser signals.
Dataset
92 JA4-to-application mappings across 14 identified bots. Available for download: