Data engineers must manually decide when to create aggregate tables for performance, then go back and build them when BI tools choke on volume — a reactive, repetitive cycle.
Monitors BI query logs (Looker, Tableau, Power BI), identifies slow or frequently-run queries hitting granular fact tables, and auto-generates dbt models for optimal aggregate tables with a one-click deploy workflow.
subscription
The pain signals are real and recurring — data engineers repeatedly describe the reactive cycle of building agg tables when dashboards choke. However, it's a 7 not a 9 because it's a productivity pain, not a business-critical emergency. Teams tolerate it for weeks/months before acting. The pain is chronic and annoying, not acute and urgent.
TAM is constrained to companies using dbt + a BI tool + a cloud warehouse, which is a specific but growing segment. Estimated ~15,000-25,000 companies globally fit the profile (mid-size, dbt users, BI tool). At $500-2000/month, that's a $90M-$600M TAM. Solid for a bootstrapped/seed-stage company, but niche compared to broader data tooling.
Data teams have tooling budgets and already pay for dbt Cloud, warehouse compute, and BI licenses. A tool that demonstrably reduces warehouse costs and improves dashboard performance has a clear ROI story. However, the buyer (data engineer) often doesn't control budget — they'd need to convince a data/eng manager. Competing with 'just do it manually' is always a risk at this price point. Mid-market teams are price-sensitive on incremental tooling.
This is harder than it looks. Parsing query logs from 3+ BI tools (Looker, Tableau, Power BI) each with different log formats and APIs is significant integration work. Analyzing query patterns to identify optimal aggregation strategies requires non-trivial SQL analysis. Generating correct, production-quality dbt models with proper grain, joins, and naming conventions is complex. A solo dev could build a working MVP for ONE BI tool (e.g., Looker only) in 6-8 weeks, but cross-BI support pushes this to 3-4 months. The 'one-click deploy' into a dbt project safely is another layer of complexity.
The gap is genuinely open. No existing tool combines: (1) BI query log analysis across tools, (2) automatic aggregate table identification, (3) dbt-native model generation, and (4) one-click deploy. AtScale is closest but is enterprise-priced and not dbt-native. Cube requires architectural changes. Looker Aggregate Awareness validates the concept but is manual and single-tool. The dbt-native angle is a strong differentiator nobody else occupies.
Strong subscription fit. Query patterns change as data grows and new dashboards are created — ongoing monitoring and optimization is inherently recurring. Usage-based pricing tied to queries analyzed or models generated aligns value with cost. Teams won't want to lose visibility once they have it. The 'set it and forget it' monitoring creates natural retention.
- +Genuine gap in the market — no tool combines BI log analysis + dbt model generation + one-click deploy
- +dbt-native output is a killer differentiator — fits existing workflows with zero new infrastructure
- +Clear ROI story: reduced warehouse compute costs + faster dashboards = quantifiable savings
- +Recurring value: data grows, queries change, optimization is never 'done'
- +Pain is validated by real community discussions and the existence of manual workarounds (Looker Aggregate Awareness)
- !Platform risk: dbt Labs could build this into dbt Cloud (they have MetricFlow + warehouse query metadata)
- !Integration complexity: supporting 3 BI tools well is 3x the work of supporting 1
- !Snowflake/Databricks enhancing native auto-materialization could make this 'good enough' at the warehouse layer
- !Buyer ≠ user problem: data engineers feel the pain but may not control the tooling budget
- !Generating correct dbt models automatically is a hard problem — bad output kills trust immediately
Open-source semantic layer with a pre-aggregation engine that auto-materializes rollup tables in the warehouse. BI tools query through Cube's API layer, which routes to the fastest available pre-aggregation.
Enterprise semantic layer that creates a virtual data warehouse abstraction. Includes autonomous aggregate management — monitors query patterns and auto-creates/manages aggregate tables in the underlying warehouse.
dbt Labs' metrics engine
Built-in Looker feature where you define aggregate tables in LookML and Looker automatically routes queries to the smallest/fastest aggregate that can answer the query.
dbt Cloud cost monitoring and optimization tool. Analyzes dbt runs and warehouse queries to surface expensive/slow models and suggest optimizations.
Start with Looker-only + one warehouse (Snowflake or BigQuery). Connect to Looker's System Activity logs, identify the top 10 slowest/most-frequent queries hitting fact tables, and generate a dbt staging model + aggregate model with correct grain. Ship as a CLI tool or lightweight web app that reads the dbt project, proposes changes as a PR/diff, and lets the engineer review before merging. Skip Power BI and Tableau for v1 entirely.
Free CLI tool that analyzes Looker logs and shows optimization opportunities (lead gen) → Paid tier ($300-500/month) that auto-generates dbt models and monitors continuously → Team tier ($1000-2000/month) with multi-BI support, scheduled monitoring, Slack alerts, and warehouse cost savings dashboard → Enterprise with SSO, audit logs, and custom integrations
3-5 months. Month 1-2: build Looker log parser + query analyzer. Month 2-3: dbt model generator + review UI. Month 3-4: beta with 5-10 design partners from dbt community (Reddit, dbt Slack). Month 4-5: first paying customers. The dbt community is tight-knit and early-adopter friendly, which helps, but the technical complexity of generating correct dbt models will likely cause iteration cycles.
- “If your BI tool chokes on volume, build a separate agg table downstream”
- “constantly go back and do back end work whenever someone wants a new filter”
- “BI calculation would significantly impact user experience”
- “also make a agg_table that aggregates the data”