Internal tools silently decay — no team owns them, contractors patch them, and nobody notices until they're critical and broken.
Lightweight agent that monitors internal services for uptime, secret exposure, dependency staleness, ownership gaps (bus factor), and UX degradation signals, alerting platform teams before tools become emergencies.
SaaS subscription per monitored service, freemium for up to 3 services
The pain is real and widespread — every mid-to-large engineering org has rotting internal tools. The Reddit thread with 83 upvotes and 95 comments confirms this resonates. However, it's a slow-burn pain, not a hair-on-fire emergency. Teams tolerate rotten tools for years. The urgency spike only comes during incidents, which makes consistent buyer motivation tricky.
TAM is narrower than it appears. Target is platform engineering teams at companies with 100+ engineers (where internal tool sprawl begins). Estimated ~15,000 such companies globally. At $500/month average (10 services × $50), that's ~$90M TAM. Serviceable market is probably $20-30M. Not a venture-scale market on its own, but a solid bootstrapped SaaS opportunity.
This is the weakest link. Platform teams have budget, but 'tool rot prevention' competes with 'we could just assign an engineer to fix it when it breaks.' The ROI story requires quantifying the cost of incidents caused by unmaintained tools, which is real but hard to measure upfront. Existing players (Cortex, OpsLevel) have validated willingness to pay for service catalogs, but those sell to CTOs on compliance/standards — 'rot detection' is a harder sell without an incident to point at.
A solo dev can absolutely build an MVP in 4-8 weeks. Core signals — git commit frequency, dependency age, contributor count/turnover, uptime pings, secret scanning via git hooks — are all available via existing APIs (GitHub, GitLab, PagerDuty). The 'lightweight agent' approach is smart. UX degradation signals are harder (need synthetic monitoring or user feedback loops) but can be deferred to v2. Main risk is integration breadth — each org's tool stack is different.
No one does exactly this. Cortex/OpsLevel are the closest but approach it from a 'catalog and score' angle, not a 'detect decay trajectory and alert before crisis' angle. The key differentiation is temporal analysis — not 'what's your score today' but 'this service has been declining for 6 months and will become a liability.' That said, Cortex could build this feature in a quarter, so the moat is thin.
Strong subscription fit. Tool rot is a continuous problem — new tools get built, people leave, dependencies age. This naturally requires ongoing monitoring. Per-service pricing scales with the customer. Churn risk: if a customer fixes their rotten tools, do they cancel? Mitigation: position as continuous hygiene, not one-time cleanup.
- +Genuine, widely-felt pain point validated by organic community discussion — engineers viscerally relate to 'the tool nobody owns'
- +Clear gap in existing market — competitors catalog services but don't detect decay trajectories or alert on slow rot
- +Technically feasible MVP using existing APIs (GitHub, CI/CD, monitoring) — no novel infrastructure needed
- +Natural per-service SaaS pricing with built-in expansion revenue as customers onboard more services
- +Platform engineering teams are well-funded buyers with growing budgets and organizational mandate
- !Cortex or OpsLevel could ship a 'health trends' feature and eliminate the differentiation overnight — the moat is insight, not technology
- !Willingness to pay for prevention is historically weak — teams buy after incidents, not before them. Marketing must overcome the 'we'll deal with it when it breaks' inertia
- !Integration surface area is massive — every org uses different VCS, CI, monitoring, and identity providers. Supporting enough combinations for product-market fit is a long tail problem
- !The 'bus factor' and ownership signals require HR/identity data (who left the company) that's sensitive and hard to access programmatically
- !Risk of being perceived as 'just dashboards' — must deliver actionable remediation paths, not just red/yellow/green scores
Service catalog with scorecards that track service maturity across ownership, documentation, security, and operational readiness. Teams define standards and Cortex scores each service against them.
Service ownership and maturity platform. Provides a service catalog with maturity rubrics, checks for best practices
Open-source developer portal framework. Provides a service catalog, TechDocs, and a plugin ecosystem. Teams build their own internal developer portal on top of it.
Extension of Datadog's observability platform that lets teams register services with ownership, metadata, and link to existing Datadog monitors, SLOs, and dashboards.
Dependency vulnerability scanning
GitHub/GitLab app that scans connected repositories and produces a 'Tool Health Report' per service: commit velocity trend (declining/flatlined), dependency staleness score (days behind latest), contributor bus factor (single-maintainer flag), last CI run status, and open/stale PR count. Alert via Slack when any service crosses a decay threshold. No agent install needed for v1 — pure API-driven. Dashboard shows a ranked list of 'most at risk' internal tools. Ship with 3-service free tier.
Free tier (3 services, weekly digest email) -> Team plan $49/month for 10 services (real-time Slack alerts, historical trends, ownership mapping) -> Business plan $149/month for 50 services (SSO, custom decay rules, incident correlation, Jira/Linear ticket auto-creation) -> Enterprise custom (unlimited services, on-prem agent option, compliance reporting)
8-12 weeks to first paying customer. Weeks 1-4: build MVP GitHub app with decay scoring. Weeks 5-6: dogfood on open-source projects and share reports on Reddit/HN to validate messaging. Weeks 7-8: private beta with 5-10 platform engineering teams from the Reddit thread audience. Weeks 9-12: convert beta users to paid Team plan. First dollar likely around week 10.
- “bandaid project for contractors in recent years”
- “The engineer who originally built it is long gone”
- “infrastructure goes down all the time”
- “left to rot with occasional patches by contractors because it's stable”
- “zero changes when environment or requirements change”