6.4mediumCONDITIONAL GO

AutoAgg

Automatically detects slow BI queries and generates optimized aggregate tables in dbt.

DevToolsData engineers and analytics engineers at mid-size companies using dbt + a BI...
The Gap

Data engineers must manually decide when to create aggregate tables for performance, then go back and build them when BI tools choke on volume — a reactive, repetitive cycle.

Solution

Monitors BI query logs (Looker, Tableau, Power BI), identifies slow or frequently-run queries hitting granular fact tables, and auto-generates dbt models for optimal aggregate tables with a one-click deploy workflow.

Revenue Model

subscription

Feasibility Scores
Pain Intensity7/10

The pain signals are real and recurring — data engineers repeatedly describe the reactive cycle of building agg tables when dashboards choke. However, it's a 7 not a 9 because it's a productivity pain, not a business-critical emergency. Teams tolerate it for weeks/months before acting. The pain is chronic and annoying, not acute and urgent.

Market Size6/10

TAM is constrained to companies using dbt + a BI tool + a cloud warehouse, which is a specific but growing segment. Estimated ~15,000-25,000 companies globally fit the profile (mid-size, dbt users, BI tool). At $500-2000/month, that's a $90M-$600M TAM. Solid for a bootstrapped/seed-stage company, but niche compared to broader data tooling.

Willingness to Pay6/10

Data teams have tooling budgets and already pay for dbt Cloud, warehouse compute, and BI licenses. A tool that demonstrably reduces warehouse costs and improves dashboard performance has a clear ROI story. However, the buyer (data engineer) often doesn't control budget — they'd need to convince a data/eng manager. Competing with 'just do it manually' is always a risk at this price point. Mid-market teams are price-sensitive on incremental tooling.

Technical Feasibility5/10

This is harder than it looks. Parsing query logs from 3+ BI tools (Looker, Tableau, Power BI) each with different log formats and APIs is significant integration work. Analyzing query patterns to identify optimal aggregation strategies requires non-trivial SQL analysis. Generating correct, production-quality dbt models with proper grain, joins, and naming conventions is complex. A solo dev could build a working MVP for ONE BI tool (e.g., Looker only) in 6-8 weeks, but cross-BI support pushes this to 3-4 months. The 'one-click deploy' into a dbt project safely is another layer of complexity.

Competition Gap8/10

The gap is genuinely open. No existing tool combines: (1) BI query log analysis across tools, (2) automatic aggregate table identification, (3) dbt-native model generation, and (4) one-click deploy. AtScale is closest but is enterprise-priced and not dbt-native. Cube requires architectural changes. Looker Aggregate Awareness validates the concept but is manual and single-tool. The dbt-native angle is a strong differentiator nobody else occupies.

Recurring Potential8/10

Strong subscription fit. Query patterns change as data grows and new dashboards are created — ongoing monitoring and optimization is inherently recurring. Usage-based pricing tied to queries analyzed or models generated aligns value with cost. Teams won't want to lose visibility once they have it. The 'set it and forget it' monitoring creates natural retention.

Strengths
  • +Genuine gap in the market — no tool combines BI log analysis + dbt model generation + one-click deploy
  • +dbt-native output is a killer differentiator — fits existing workflows with zero new infrastructure
  • +Clear ROI story: reduced warehouse compute costs + faster dashboards = quantifiable savings
  • +Recurring value: data grows, queries change, optimization is never 'done'
  • +Pain is validated by real community discussions and the existence of manual workarounds (Looker Aggregate Awareness)
Risks
  • !Platform risk: dbt Labs could build this into dbt Cloud (they have MetricFlow + warehouse query metadata)
  • !Integration complexity: supporting 3 BI tools well is 3x the work of supporting 1
  • !Snowflake/Databricks enhancing native auto-materialization could make this 'good enough' at the warehouse layer
  • !Buyer ≠ user problem: data engineers feel the pain but may not control the tooling budget
  • !Generating correct dbt models automatically is a hard problem — bad output kills trust immediately
Competition
Cube.dev

Open-source semantic layer with a pre-aggregation engine that auto-materializes rollup tables in the warehouse. BI tools query through Cube's API layer, which routes to the fastest available pre-aggregation.

Pricing: Cube Core is free/open-source. Cube Cloud starts ~$200/month, scaling to enterprise custom pricing.
Gap: Not dbt-native — has its own modeling layer, so dbt teams must maintain a parallel paradigm. Requires BI tools to query through Cube (major architectural change). Does NOT monitor existing Looker/Tableau/Power BI query logs. No dbt model generation output.
AtScale

Enterprise semantic layer that creates a virtual data warehouse abstraction. Includes autonomous aggregate management — monitors query patterns and auto-creates/manages aggregate tables in the underlying warehouse.

Pricing: Enterprise-only, custom pricing (typically six figures annually
Gap: Prohibitively expensive for mid-market teams. Not dbt-native — proprietary modeling layer, no dbt model output. Heavy platform requiring significant architectural commitment. Weak/no Looker integration. No one-click deploy to dbt workflows.
dbt Semantic Layer (MetricFlow)

dbt Labs' metrics engine

Pricing: MetricFlow engine is free/open-source. Managed Semantic Layer requires dbt Cloud Team ($100/seat/month
Gap: Does NOT auto-generate aggregate tables — computes metrics at query time or relies on existing models. No query log analysis to find slow queries. No optimization recommendations. You still manually decide what to materialize.
Looker Aggregate Awareness

Built-in Looker feature where you define aggregate tables in LookML and Looker automatically routes queries to the smallest/fastest aggregate that can answer the query.

Pricing: Included with Looker (Google Cloud pricing
Gap: Completely manual — you must identify which aggregates to create and write the LookML yourself. No automatic detection of slow queries or recommendations. Looker-only, does not work with Tableau or Power BI. Not dbt-native — aggregates defined in LookML, not dbt.
SELECT.dev

dbt Cloud cost monitoring and optimization tool. Analyzes dbt runs and warehouse queries to surface expensive/slow models and suggest optimizations.

Pricing: SaaS, usage-based starting ~$100/month.
Gap: Monitoring only — identifies expensive models but does NOT auto-generate aggregate tables. Focuses on dbt run costs, not BI query performance. No BI query log integration — doesn't know which dashboards are slow for end users. No action layer, just reporting.
MVP Suggestion

Start with Looker-only + one warehouse (Snowflake or BigQuery). Connect to Looker's System Activity logs, identify the top 10 slowest/most-frequent queries hitting fact tables, and generate a dbt staging model + aggregate model with correct grain. Ship as a CLI tool or lightweight web app that reads the dbt project, proposes changes as a PR/diff, and lets the engineer review before merging. Skip Power BI and Tableau for v1 entirely.

Monetization Path

Free CLI tool that analyzes Looker logs and shows optimization opportunities (lead gen) → Paid tier ($300-500/month) that auto-generates dbt models and monitors continuously → Team tier ($1000-2000/month) with multi-BI support, scheduled monitoring, Slack alerts, and warehouse cost savings dashboard → Enterprise with SSO, audit logs, and custom integrations

Time to Revenue

3-5 months. Month 1-2: build Looker log parser + query analyzer. Month 2-3: dbt model generator + review UI. Month 3-4: beta with 5-10 design partners from dbt community (Reddit, dbt Slack). Month 4-5: first paying customers. The dbt community is tight-knit and early-adopter friendly, which helps, but the technical complexity of generating correct dbt models will likely cause iteration cycles.

What people are saying
  • If your BI tool chokes on volume, build a separate agg table downstream
  • constantly go back and do back end work whenever someone wants a new filter
  • BI calculation would significantly impact user experience
  • also make a agg_table that aggregates the data