7.5highGO

LogicLedger

Auto-discovers and documents business logic from existing SQL queries, pipelines, and reports into a living knowledge base.

DevToolsData engineers inheriting undocumented data stacks, especially solo DEs at sm...
The Gap

When the previous DE leaves, all business logic knowledge walks out the door — new DEs spend months reverse-engineering undocumented transformations.

Solution

Connects to data pipelines, SQL scripts, and BI tools to parse transformation logic, then generates human-readable documentation with lineage graphs and version history.

Revenue Model

Freemium — free for up to 10 queries/pipelines, $49-$149/mo for full workspace with collaboration and change tracking.

Feasibility Scores
Pain Intensity9/10

This is a top-3 pain point in data engineering. The Reddit thread captures a universal experience — every DE who inherits an undocumented stack has lived this nightmare. It causes months of lost productivity, costly errors in business reporting, and significant organizational risk. Companies have literally made wrong business decisions because no one understood the legacy logic. The pain is acute, recurring (every time someone leaves), and has real dollar consequences.

Market Size6/10

TAM is tricky. There are ~200k data engineers in the US, but the sweet spot is solo/small-team DEs at companies with 50-500 employees — maybe 30-50k potential users. At $99/mo average, that's ~$36-60M TAM. Not venture-scale, but excellent for a bootstrapped SaaS. The constraint is that enterprise (where the big money is) is served by catalogs, and very small teams may resist paying. Mid-market is the wedge but it's a narrower band than it appears.

Willingness to Pay7/10

DEs at companies spending $50k+/year on Snowflake can justify $149/mo for documentation tooling — it's a rounding error. The buyer is often the DE themselves (bottoms-up) or their engineering manager. Pain signals suggest this saves weeks of onboarding time, which easily justifies the price. However, there's a cultural challenge: many DEs see documentation as something that 'should be free' or 'should be part of dbt/the warehouse.' The $49-149 range is well-calibrated — low enough for a credit card purchase, high enough to signal value.

Technical Feasibility7/10

Core SQL parsing is solved (SQLGlot handles most dialects). LLMs can generate human-readable explanations effectively. The hard parts: (1) connecting to diverse data stacks (Airflow, dbt, Looker, Tableau, stored procs, Python scripts) requires many integrations, (2) resolving cross-pipeline dependencies accurately is non-trivial, (3) keeping documentation 'living' (auto-updating on changes) requires webhooks/polling infrastructure. A solo dev can build an MVP for SQL-only in 6-8 weeks, but the 'connects to everything' vision is a 6-12 month journey. Start narrow: just SQL files + one warehouse.

Competition Gap8/10

The gap is real and significant. Enterprise catalogs are too expensive and complex for SMBs. dbt requires rewriting everything. ChatGPT is ephemeral. Nobody offers: (1) auto-discovery of business logic from raw SQL with (2) persistent, versioned, human-readable documentation at (3) a price point accessible to solo DEs. The intersection of 'automated logic extraction + living docs + affordable' is genuinely unoccupied. The risk is that dbt Cloud or a catalog player adds this feature, but their incentives point upmarket, not down.

Recurring Potential8/10

Strong recurring dynamics: (1) pipelines change constantly, so documentation must be continuously updated — this isn't a one-time export, (2) new team members onboard regularly, creating ongoing value, (3) version history becomes more valuable over time (network effect with your own data), (4) once documentation exists, removing it is painful. Low churn risk once embedded in workflow. The 'living' aspect is the key to retention — static docs would be a one-time purchase, but auto-updating docs are a subscription.

Strengths
  • +Solves a universally recognized, high-pain problem that every data engineer has experienced firsthand — strong emotional resonance for marketing
  • +Clear competition gap: enterprise catalogs are 100x the price, dbt requires migration, ChatGPT is ephemeral — no one serves the solo DE at SMBs
  • +LLM advances make the core technical proposition (auto-explain SQL in plain English) dramatically more feasible than it was 2 years ago
  • +Strong bottoms-up adoption potential: individual DEs can sign up without procurement approval at the $49-149 price point
  • +Built-in retention moat: documentation becomes more valuable over time and is painful to abandon once the team relies on it
Risks
  • !Integration breadth is the make-or-break challenge: every data stack is different (Snowflake + dbt + Airflow vs. BigQuery + Dataform + Composer vs. stored procedures in SQL Server), and supporting even the top 3 combinations requires significant engineering effort
  • !dbt Cloud is aggressively adding AI documentation features and could close the gap for dbt users specifically — this would eliminate a significant portion of the target market
  • !Solo DEs at small companies may have the pain but not the budget authority or culture to pay for documentation tooling — they might just paste SQL into ChatGPT and call it good enough
  • !Accuracy risk: if auto-generated business logic explanations are wrong or misleading, it's worse than no documentation — trust is hard to earn and easy to lose
  • !The 'living' aspect requires reliable change detection and re-parsing, which adds operational complexity (CI/CD hooks, warehouse query log polling, git monitoring)
Competition
dbt (data build tool)

Open-source SQL transformation framework that encourages documenting models, tests, and lineage as code. dbt Cloud adds a hosted catalog and lineage visualization.

Pricing: dbt Core is free/open-source. dbt Cloud: Developer free, Team $100/seat/mo, Enterprise custom.
Gap: Does NOT auto-discover or reverse-engineer existing SQL — you must rewrite queries into dbt models first. Useless for legacy pipelines, stored procedures, or non-dbt SQL scripts. Documentation is manual, not auto-generated. No business logic extraction from existing reports or BI tools.
Atlan / Alation / Collibra (Data Catalogs)

Enterprise data catalog platforms that provide metadata management, data lineage, search, and governance. Atlan is the modern challenger; Alation and Collibra are incumbents.

Pricing: Atlan starts ~$30k+/year. Alation ~$50k+/year. Collibra ~$100k+/year. All enterprise sales.
Gap: Designed for large enterprises with dedicated data governance teams — massive overkill and unaffordable for solo DEs at small companies. Focus on metadata cataloging, NOT on extracting and explaining business logic in human-readable form. Lineage shows table-to-table flow but rarely explains WHY a transformation exists or what business rule it encodes.
SQLGlot / SQLLineage / Open-source parsers

Open-source Python libraries that parse SQL to extract column-level lineage, transformations, and dependencies. SQLGlot can transpile between dialects and analyze query structure.

Pricing: Free / open-source.
Gap: These are libraries, not products. No UI, no collaboration features, no versioning, no business-logic explanation layer. Requires significant engineering effort to turn into anything useful. No BI tool integration. Solo DEs don't have time to build a documentation platform from parsing libraries.
Castor / Select Star

Modern data discovery and documentation platforms that auto-crawl warehouses and BI tools to generate lineage and documentation. Select Star emphasizes automated lineage; Castor focuses on AI-generated documentation.

Pricing: Castor: starts ~$10k/year. Select Star: starts ~$15k/year. Both target mid-market.
Gap: Still too expensive for solo DEs or small teams ($800-1200/mo minimum). Focus on column/table descriptions, not deep business logic explanation (e.g., 'why is revenue calculated this way?'). Don't parse raw SQL scripts or stored procedures outside the warehouse. No version history of logic changes over time.
GitHub Copilot / ChatGPT for SQL explanation

Developers paste SQL into ChatGPT or use Copilot to get explanations of what queries do. Ad-hoc but increasingly common workflow for reverse-engineering legacy SQL.

Pricing: ChatGPT Plus $20/mo, Copilot $10-19/mo.
Gap: Completely ephemeral — no persistence, no versioning, no lineage graphs, no cross-query relationships. Doesn't connect to your actual data stack. Can't track changes over time. Hallucinations on business context it doesn't have. No collaboration. Each query is analyzed in isolation with no awareness of the broader pipeline.
MVP Suggestion

Week 1-2: Build a web app where users upload or paste SQL files/queries. Use SQLGlot for parsing and an LLM (Claude API) to generate human-readable business logic explanations. Show a basic lineage graph (table/column dependencies). Week 3-4: Add Snowflake and BigQuery direct connections to auto-discover queries from query history. Week 5-6: Add versioning — detect when SQL changes and highlight what business logic changed. Week 7-8: Add a shareable workspace with search. Ship the free tier (10 queries) and start collecting feedback. Do NOT build Airflow/dbt/Looker integrations until you have 50+ users asking for them.

Monetization Path

Free tier (10 queries, paste-only) to drive adoption and SEO/word-of-mouth → $49/mo Pro (unlimited queries, warehouse connection, version history) for individual DEs → $149/mo Team (collaboration, shared workspace, change notifications, SSO) for small data teams → $499+/mo Enterprise (API access, custom integrations, audit logs, on-prem) once you have traction. Consider a one-time 'audit report' product ($199-499) for DEs who just need to document a stack once during onboarding — this captures the 'I just joined and need to understand everything' moment.

Time to Revenue

4-6 weeks to MVP launch, 8-12 weeks to first paying customer. The paste-SQL-and-explain feature can be built and monetized quickly. Target the 'just joined a new company' moment — post in r/dataengineering, data Twitter/Bluesky, and dbt Slack community. First dollar likely comes from a solo DE who just inherited a messy stack and needs to understand it fast. Path to $5k MRR in 4-6 months if execution is strong.

What people are saying
  • not only did no one know how the business logic was set up
  • the old logic they were referring to was incorrect
  • the last person in that role left three years ago
  • source of that data was an Excel sheet that was last updated three years ago