7.4mediumCONDITIONAL GO

RequestTrace

A single-pane tool that answers 'what happened to this request' across services without grepping logs.

DevToolsOn-call engineers and SREs at companies running 10+ microservices
The Gap

During incidents, engineers waste time opening 6 dashboards and grepping logs to trace a single request through over-architected service meshes.

Solution

Lightweight agent that auto-instruments request flows and provides a single search interface: paste a request/correlation ID, get the full write path, every service hop, state mutations, and queue transitions in one timeline view. Optimized for incident response, not observability dashboards.

Revenue Model

Subscription — usage-based pricing starting at $99/mo for small teams, scaling with ingested traces

Feasibility Scores
Pain Intensity9/10

This is a top-3 pain point for every on-call engineer at microservices shops. The Reddit thread signals confirm it. During incidents, the cost of slow resolution is measured in revenue loss ($thousands to $millions per hour depending on company size). 'What happened to this request' is literally the first question asked in every incident, and the existing answer is always 'open 6 tabs and start grepping.' This pain is acute, frequent, and costly.

Market Size7/10

TAM for observability is massive ($40B+), but the addressable slice for an incident-response-specific tracing tool is narrower. Target is companies with 10+ microservices — roughly 50,000-200,000 companies globally. At $99-500/mo average, that's $60M-$1.2B SAM. Realistic early market is mid-market teams (50-500 engineers) underserved by enterprise tools but outgrowing open-source. Not a tiny market, but you're carving a niche within a niche initially.

Willingness to Pay7/10

Engineers already pay $31-200+/host/month for observability (Datadog, New Relic). The budget line item exists. However, this positions as complementary to existing tools, not a replacement — which means additive budget approval. $99/mo for small teams is an easy credit-card purchase. Risk: some teams will say 'we already pay for Datadog, why do we need this too?' Need to clearly position as incident-response layer, not another observability tool.

Technical Feasibility5/10

This is the hardest part. 'Auto-instruments request flows' is doing enormous heavy lifting. Supporting diverse tech stacks (Node, Go, Java, Python, Ruby), message queues (Kafka, RabbitMQ, SQS), databases, and service meshes requires significant instrumentation work. OpenTelemetry helps but doesn't cover state mutations or queue transitions out of the box. A solo dev can build an MVP that works for ONE stack (e.g., Node + Kafka + PostgreSQL) in 6-8 weeks, but broad coverage is a multi-year effort. The 'lightweight agent' claim will be tested hard by reality.

Competition Gap7/10

Every existing tool optimizes for dashboards and proactive monitoring. None nail the 'paste an ID, get the full story in 10 seconds' incident workflow. Honeycomb comes closest but still requires query expertise. The gap is real: incident-response-first UX, state mutation tracking, queue transition visibility, and zero-config setup. However, every major player (Datadog, Grafana, Honeycomb) could build this as a feature in a quarter if it gains traction. Defensibility comes from execution speed and depth of the incident workflow, not technology.

Recurring Potential9/10

Classic infrastructure SaaS — once instrumented, switching cost is high. Usage-based pricing scales naturally with team and service growth. Incidents are ongoing (not seasonal), so value is continuous. Teams won't rip out tracing mid-incident. Expansion revenue is natural: more services = more traces = higher bill. Net revenue retention in observability companies typically exceeds 120%.

Strengths
  • +Genuine, intense pain point validated by real engineering discourse — not a solution looking for a problem
  • +Clear differentiation: incident-response-first UX vs dashboard-first competitors
  • +Existing budget line item in target companies (observability spend already approved)
  • +Strong recurring revenue dynamics with natural expansion as services grow
  • +OpenTelemetry standardization lowers the instrumentation barrier and reduces vendor lock-in fear for buyers
  • +Founder can dogfood immediately if they're an on-call engineer themselves
Risks
  • !Technical scope creep: 'auto-instruments request flows' across diverse stacks is a multi-year, multi-engineer problem — MVP must be ruthlessly scoped to 1-2 tech stacks
  • !Feature absorption: Datadog or Honeycomb could ship a 'request timeline' feature that neutralizes the core differentiator within months of traction
  • !Positioning confusion: buyers may categorize this as 'yet another observability tool' and reject it because they already have one
  • !Instrumentation fatigue: teams already have agents from Datadog/New Relic/OTel and may resist adding another agent to production
  • !Sales cycle risk: infrastructure purchases at 10+ microservice companies often require security review and procurement, slowing time-to-revenue
Competition
Honeycomb

Observability platform built on high-cardinality event data with trace visualization, BubbleUp analysis, and query-driven debugging. Strong focus on understanding production behavior.

Pricing: Free tier, Team at $130/mo, Pro/Enterprise custom pricing. Usage-based on events ingested.
Gap: Still dashboard-oriented — requires knowing WHAT to query. No single 'paste an ID, get the full story' workflow optimized for incident panic mode. Steep learning curve. Expensive at scale. Not optimized for state mutation tracking or queue transitions.
Jaeger (open-source, CNCF)

Open-source distributed tracing system originally built by Uber. Collects and visualizes trace data across microservices. Part of the CNCF ecosystem.

Pricing: Free (self-hosted
Gap: Pure tracing only — no log correlation, no state mutation tracking, no queue/event-bus visibility. Requires significant operational overhead to run. UI is functional but dated. Zero incident-response workflow — it's a tool, not a solution. No auto-instrumentation agent that 'just works'.
Datadog APM & Distributed Tracing

Full-stack observability platform with APM, distributed tracing, log management, infrastructure monitoring, and incident management in one product.

Pricing: APM starts at $31/host/month. Logs at $0.10/GB ingested. Total cost often $50-200+/host/month when combining products.
Gap: Extremely expensive at scale (bill shock is legendary). Overwhelming UI with too many dashboards — the exact '6 dashboards' problem the idea targets. Optimized for monitoring, not incident response. Finding what happened to ONE request still requires navigating multiple views. Vendor lock-in concerns.
Lightstep (now ServiceNow Cloud Observability)

Observability platform focused on change intelligence — correlating deployments and config changes with performance regressions. Strong distributed tracing roots.

Pricing: Enterprise pricing, typically $20-40/host/month. Free tier available with limited retention.
Gap: Post-ServiceNow acquisition, product direction is muddled. Enterprise sales focus means small teams are ignored. No specific incident-response workflow. Doesn't track state mutations or queue transitions. Becoming yet another enterprise observability suite.
Grafana Tempo + Loki + Grafana Stack

Open-source observability stack combining Tempo

Pricing: Free self-hosted. Grafana Cloud free tier, Pro at $29/mo, Advanced custom. Usage-based for traces and logs.
Gap: Requires stitching together 3+ tools yourself — you ARE the '6 dashboards' problem. Complex setup and operations. No unified 'request timeline' view that includes state mutations. Optimized for dashboarding engineers, not panicking on-call engineers at 3am. No auto-instrumentation agent.
MVP Suggestion

Scope MVP to ONE stack: Node.js/TypeScript + PostgreSQL + Kafka (or SQS). Build an OpenTelemetry-based collector (not a custom agent) that enriches traces with state mutation data (DB writes) and queue transitions. Ship a dead-simple web UI with ONE input field: paste correlation ID, get a vertical timeline showing every service hop, DB write, and queue publish/consume with timestamps and payloads. Deploy as a Docker Compose stack or Helm chart. Target: under 15 minutes from 'docker compose up' to first traced request. Skip dashboards, skip alerting, skip metrics — just answer 'what happened to this request' faster than anyone else.

Monetization Path

Free open-source single-node collector (community + adoption) -> Hosted/cloud version at $99/mo for teams up to 5 services and 7-day retention -> Pro at $299/mo for 20 services, 30-day retention, and team features (shared investigations, incident annotations) -> Enterprise at custom pricing for SSO, RBAC, compliance, unlimited retention, and on-prem deployment. Add usage-based trace ingestion pricing ($2-5/million traces) at scale tier.

Time to Revenue

8-12 weeks to MVP with a single-stack focus. First paying customer at 12-16 weeks if founder has a personal network of SRE/platform engineering contacts. Meaningful revenue ($5K+ MRR) at 6-9 months. The long pole is not building the product — it's getting teams to instrument and trust a new tool in production, which takes a proof-of-concept period of 2-4 weeks per customer.

What people are saying
  • Can you answer 'what happened to this request' without opening 6 dashboards and grepping logs like its 2014
  • three people spent a day reconstructing state because nobody remembered how the projections worked
  • ops clarity is the one people miss