Greenfield DW projects fail because teams pick tools before understanding workload, and gathering requirements from execs is unstructured and overwhelming.
Async interview tool that sends structured questionnaires to business stakeholders, synthesizes their reporting needs into a prioritized subject-area roadmap, identifies many-to-many source-to-report dependencies, and outputs draft dimensional models with entity relationships.
Subscription - $149/mo per workspace, with per-stakeholder interview credits on free tier.
The Reddit thread and pain signals are textbook. Greenfield DW projects routinely fail because requirements gathering is unstructured — data engineers spend weeks in meetings with execs who can't articulate what they need, then build the wrong thing. The pain is acute at project kickoff and recurs with every new initiative. The signal 'don't let vendor calls drive your architecture' confirms teams are making tool decisions before understanding workload.
TAM is constrained. Target buyers are data/analytics engineers at companies standing up or re-platforming data warehouses — a real but episodic need. Estimated ~50K-100K companies globally doing greenfield or major DW projects in any given year. At $149/mo, theoretical TAM is ~$90M-180M. Not venture-scale but very healthy for a bootstrapped or seed-funded startup. The niche is deep but narrow.
$149/mo is well within data team budgets — they already pay $100/seat for dbt Cloud and $30K+ for catalogs. A tool that saves 2-4 weeks of requirements gathering easily justifies itself. However, the challenge is that the pain is episodic (project kickoff), not continuous — buyers may churn after the initial planning phase unless you create ongoing value. Per-stakeholder interview credits add good expansion mechanics.
Core MVP is achievable by a solo dev in 6-8 weeks: structured questionnaire engine (forms/async messaging), LLM-powered synthesis of responses into subject areas and entity relationships, and a prioritized roadmap output. No novel AI needed — this is prompt engineering + structured output over GPT-4/Claude. The hardest part is domain expertise in dimensional modeling to make the output actually useful, not the engineering.
This is the strongest signal. NO existing tool covers the 'plan' phase of DW development. The entire market serves catalog/govern (Atlan, Alation), model/design (Erwin, SqlDBM), build/transform (dbt, Matillion), or monitor (Monte Carlo). The requirements-to-roadmap gap is completely unaddressed by any product. Today it's done with spreadsheets, Confluence pages, and unstructured meetings. This is a genuine whitespace.
This is the biggest risk. DW planning is inherently episodic — you plan intensely for 1-3 months, then execute for 6-12 months. Subscription retention requires expanding into ongoing value: requirements change management, roadmap tracking as delivery progresses, new stakeholder onboarding for scope expansion, or periodic re-assessment. Without this, expect high churn after the initial planning burst. The per-workspace model helps but needs a retention hook.
- +Genuine whitespace — no existing tool addresses the pre-build planning phase of data warehouse projects
- +Clear, articulate buyer persona (data/analytics engineers) with existing budget authority and tool-buying habits
- +Pain is well-documented and recurring across the industry (Reddit threads, blog posts, conference talks all echo this)
- +Natural upstream complement to dbt/SqlDBM — can generate output that feeds into existing workflows rather than replacing them
- +AI-native approach is well-timed — LLMs are genuinely good at synthesizing unstructured stakeholder input into structured models
- !Episodic usage pattern: DW planning is intense but infrequent, creating a natural churn window after initial project phase — must solve for ongoing retention
- !Output quality bar is extremely high: if the generated dimensional models or roadmaps are naive or wrong, experienced data architects will dismiss the tool after one try
- !Consulting firms (Slalom, Hashmap, phData) do this manually as part of $200K+ engagements — they could build or white-label a competing tool quickly if the market proves out
- !Enterprise buyers (your best customers) have long procurement cycles and may want SOC2, SSO, and on-prem deployment before they'll onboard stakeholders
- !LLM hallucination risk in entity relationship generation could erode trust — dimensional modeling requires precision, not creativity
Industry-standard data transformation framework with AI copilot for generating documentation, tests, and model suggestions. Lets analytics engineers define and manage data models in SQL.
Cloud-based visual data modeling tool with AI features that can generate models from text descriptions. Supports Snowflake, BigQuery, Redshift, etc.
Traditional enterprise data modeling tool for creating logical and physical data models, supports dimensional modeling, Data Vault, 3NF. Long-standing industry incumbent.
Data catalog and governance platform with AI copilot for metadata management, lineage, collaboration, and data discovery across the data estate.
Data estate automation platform that automates data integration, modeling, and deployment. Covers ingestion through semantic layer with AI-assisted pipeline creation.
Web app with three flows: (1) Data engineer creates a project and invites stakeholders via email/Slack links. (2) Stakeholders complete an async structured questionnaire (guided by AI follow-up questions) about their reporting needs, KPIs, and data sources. (3) System synthesizes all responses into a prioritized subject-area roadmap with draft star-schema models (fact/dimension tables, grain statements, key relationships) and source-to-report dependency matrix. Output as interactive web view + exportable PDF/Markdown + dbt YAML stubs. Skip auth complexity — use magic links for stakeholders.
Free tier: 1 project, 3 stakeholder interviews, basic roadmap output → $149/mo Pro: unlimited interviews, full dimensional models, dbt/SQL export, roadmap tracking → $499/mo Team: multi-project, team collaboration, Jira/Linear integration for roadmap execution → Enterprise ($custom): SSO, audit logs, consulting-firm white-label, API access for integration into existing toolchains
8-12 weeks. 6-8 weeks to build MVP, 2-4 weeks of design-partner iteration with 3-5 data teams from the Reddit/dbt community. First paying customers likely from dbt Slack community, r/dataengineering, and LinkedIn data engineering audience. The community is highly engaged and vocal about this pain — organic distribution is plausible.
- “Sudden surge of demand from top execs for full company reporting solutions”
- “growing many-to-many demand between source system and reporting solutions requirements”
- “Don't let the vendor calls drive your architecture. Figure out what questions the business actually needs answered first”
- “Don't try to build a solution for everything. You need to pick winning projects that are gonna have high ROI”