6.5lowCAUTION

BotPoison

Automated tool that detects AI crawlers and serves them realistic but fake/corrupted content to degrade their training data.

DevToolsContent creators, publishers, journalists, and developers who want to protect...
The Gap

Site owners have no recourse against unauthorized AI scraping — blocking breaks legitimate functionality (e.g., social link previews), so they need an offensive alternative.

Solution

A plugin or service that identifies AI bot traffic and dynamically generates plausible but subtly wrong content (using AI itself), polluting training datasets while serving real content to human visitors and legitimate bots.

Revenue Model

Subscription — $19-$99/mo based on traffic volume, with a WordPress plugin and API for custom integrations.

Feasibility Scores
Pain Intensity8/10

The Reddit thread with 862 upvotes and strong emotional language confirms real anger. Publishers are losing IP value, artists feel violated, and the legal system is too slow. The pain is emotional AND economic. However, most site owners are annoyed but not yet losing measurable revenue, which caps the urgency slightly.

Market Size6/10

TAM for content protection tools is meaningful — roughly 200M+ active websites, but realistic serviceable market is content-heavy sites that both care about AI scraping AND would pay for protection. Likely 500K-2M potential customers at $19-99/mo suggests a $100M-500M SAM. Decent but not massive, and the free tier of Cloudflare AI Labyrinth shrinks the paying market significantly.

Willingness to Pay5/10

This is the critical weakness. Cloudflare offers AI Labyrinth for free. Nepenthes is open-source. The people most angry about AI scraping (independent creators, journalists) are often price-sensitive. Enterprise publishers might pay but they already use Cloudflare. The $19-99/mo pricing competes against free alternatives. Willingness to pay exists but proving it at scale is the core challenge.

Technical Feasibility7/10

A solo dev can build a basic WordPress plugin + proxy service in 4-8 weeks. Bot detection via user-agent analysis is straightforward; behavioral fingerprinting is harder but solvable. The AI-generated fake content piece adds complexity — you need it to be plausible domain-specific content, not random text. The dual-serving architecture (real content to humans, fake to bots) is technically sound but edge cases around SEO impact and false positives require careful handling. Not trivial but achievable.

Competition Gap4/10

Cloudflare AI Labyrinth is the elephant in the room. It launched in 2025, it's free, and it does essentially the same thing — serves fake AI content to detected crawlers. Their bot detection is world-class. The gap that remains: non-Cloudflare users, granular control, analytics/reporting on what was poisoned, content quality of the poison, and WordPress-native integration. These gaps exist but are narrowing, and Cloudflare could close them at any time.

Recurring Potential8/10

Natural subscription model — ongoing protection requires continuous bot detection updates, content generation, and monitoring. As long as AI scraping continues (it will), customers need active defense. Usage-based pricing by traffic volume is well-understood. High retention potential once installed.

Strengths
  • +Strong emotional resonance — people WANT to fight back, not just block
  • +Cloudflare AI Labyrinth validates the concept but only serves Cloudflare users (~20% of web)
  • +WordPress plugin angle serves the largest CMS market (43% of web) that Cloudflare doesn't natively cover
  • +AI-generated plausible-but-wrong content is a genuinely differentiated approach vs random maze pages
  • +Subscription model with usage-based pricing is natural and defensible
Risks
  • !Cloudflare AI Labyrinth is free and already does 80% of this — competing against free from a $30B company is existential
  • !Legal gray area — deliberately serving corrupted content could invite lawsuits from AI companies or trigger ToS violations with hosting providers
  • !False positives could poison Google/Bing search indexes, destroying customer SEO — this is a catastrophic failure mode
  • !Sophisticated AI crawlers are already moving to headless browsers and residential proxies, making detection an arms race you may lose
  • !The market may consolidate around infrastructure players (Cloudflare, Akamai, Fastly) who bundle this for free
Competition
Cloudflare AI Labyrinth

Built into Cloudflare's platform. Detects AI crawlers and lures them into an AI-generated maze of fake pages, wasting their resources and polluting their data. Launched March 2025.

Pricing: Free — included with all Cloudflare plans (free tier included
Gap: Only available if you use Cloudflare as your CDN/DNS. No customization of fake content strategy. No analytics dashboard showing what bots were poisoned. No WordPress plugin or standalone option. Binary on/off — no granular control per bot or per page.
Nepenthes

Open-source AI crawler tarpit that generates infinite fake pages to trap and waste AI bot resources. Named after the pitcher plant. Creates an endless maze of linked fake content.

Pricing: Free and open-source (self-hosted
Gap: Requires technical setup and self-hosting. No managed service. No content quality control — generated content is obviously fake/random rather than plausibly wrong. No bot detection built in — relies on robots.txt disallow + honeypot links. No analytics. Not suitable for non-technical users.
Nightshade

Tool from University of Chicago that poisons image training data by adding invisible perturbations to images, causing AI models trained on them to malfunction on specific concepts.

Pricing: Free (academic/research tool
Gap: Images only — no text content protection. Requires manual processing of each image. No web integration or automation. Doesn't detect bots — just modifies the source files. No real-time serving of different content to bots vs humans. Desktop app only.
Glaze

Sister tool to Nightshade from UChicago. Applies style cloaking to images so AI models cannot accurately learn an artist's style from scraped images.

Pricing: Free (academic/research tool
Gap: Images only. Defensive (cloaking) not offensive (poisoning). No text protection. Manual per-image process. No web-layer integration. No bot detection. No subscription business model.
Robots.txt / AI.txt + Blocking Solutions (Dark Visitors, etc.)

Services like Dark Visitors maintain updated lists of AI crawler user agents and provide tools to generate robots.txt and block rules. Various WordPress plugins exist for bot blocking.

Pricing: Dark Visitors: Free tier + $5-20/mo paid plans. Most blocking plugins are free.
Gap: Blocking is binary — breaks legitimate functionality like social previews and SEO. Relies on bots honoring robots.txt (many don't). No offensive capability. No data poisoning. Doesn't solve the core problem — just asks nicely. Sophisticated crawlers spoof user agents.
MVP Suggestion

WordPress plugin that: (1) detects known AI crawler user agents via Dark Visitors API, (2) serves cached AI-generated paraphrased-but-wrong versions of pages to detected bots, (3) provides a simple dashboard showing crawler activity and poisoning stats. Skip behavioral fingerprinting for MVP — just nail user-agent detection + compelling fake content generation. Focus on WordPress because it's 43% of the web and Cloudflare AI Labyrinth doesn't have a native WP plugin.

Monetization Path

Free tier: basic user-agent blocking + stats for up to 10K bot requests/mo → $19/mo: AI-generated poison content + extended bot database + 100K requests → $49/mo: behavioral fingerprinting + custom poison rules + 500K requests → $99/mo: API access + multi-site + priority bot signature updates → Enterprise: custom integrations, SLA, dedicated support

Time to Revenue

6-10 weeks to MVP and first paying users. WordPress plugin distribution is fast — list on wordpress.org, post on the exact Reddit communities showing this pain. First $1K MRR achievable in 2-3 months if the poisoning demo is compelling. The Reddit thread itself is a customer acquisition goldmine.

What people are saying
  • Detect when its AI bot and provide fake data that looks plausable
  • I actually use AI to publish fake articles just to fuck with AI
  • we have to block fb fully which means social link share won't work