Writing comprehensive data tests (null rates, schema validation, referential integrity, row counts) is time-consuming and requires deep domain knowledge, so most teams only do row counts at best or skip testing entirely.
Connects to your pipeline orchestrator (Airflow, dbt, etc.), profiles N historical successful runs to learn expected patterns (row distributions, null rates, value ranges, schema shapes), then auto-generates a test suite with sensible thresholds. Tests are emitted as dbt tests, Great Expectations suites, or standalone checks that plug into CI.
Freemium — free for up to 10 datasets, $99/mo per seat for teams with unlimited datasets and alerting integrations.
The Reddit thread with 126 upvotes directly validates this pain. Every data team knows they under-test. The problem is real, recurring, and causes production incidents that erode trust in data. The 'row counts at best' signal is widespread. Docked 2 points because while painful, teams have survived without it — it's important but rarely urgent until something breaks.
TAM: ~200K mid-size companies globally with data teams × ~3 data engineers avg × $99/seat/mo = ~$7B theoretical ceiling. Realistic SAM for a bootstrapped product targeting English-speaking dbt/Airflow users: $200-500M. Large enough to build a meaningful business, but not so large that enterprise gorillas will crush you immediately. The mid-market focus is smart — Monte Carlo won't chase $99/seat deals.
$99/seat/mo is reasonable but faces headwinds: (1) data engineers are used to open-source tools, (2) competing with free Great Expectations and Elementary, (3) the buyer is often an IC data engineer, not a VP — harder to get budget approval. Upside: teams already pay for Snowflake/Databricks/dbt Cloud so incremental tooling spend is normalized. Docked points because the 'free for 10 datasets' tier may be generous enough that many teams never upgrade.
Core loop is well-defined: connect to metadata/warehouse → query N historical runs → compute statistical profiles → emit test files. No ML moonshots required — basic statistics (percentiles, z-scores, schema diffing) cover 80% of value. Solo dev with data engineering background can build an MVP targeting dbt + Snowflake in 4-6 weeks. Risk: supporting multiple orchestrators (Airflow, Dagster, Prefect) and output formats (dbt, GX, standalone) expands scope quickly. MVP should pick ONE input (dbt) and ONE output (dbt tests).
The specific workflow of 'profile N historical runs → auto-generate portable test suite with calibrated thresholds' is genuinely underserved. Great Expectations profiles but doesn't learn over time. Monte Carlo monitors but doesn't emit tests. Elementary detects anomalies but doesn't generate comprehensive suites. The gap is real. Docked 3 points because these players could add this feature relatively easily — it's a feature not a moat. Defensibility comes from execution speed and community, not technology.
Natural subscription: pipelines change, schemas evolve, new datasets appear — tests need continuous re-profiling and updating. The 'living test suite' angle supports recurring value. Risk: once tests are generated and stable, some teams may churn because the ongoing value decreases. Need to build continuous monitoring/re-calibration to maintain stickiness, not just one-time generation.
- +Directly validated pain point with strong community signal (126 upvotes, active discussion)
- +Clear gap in market — no tool does 'historical profiling → portable test generation' end-to-end
- +Technical scope is achievable for a solo dev MVP, especially if focused on dbt ecosystem
- +Mid-market positioning avoids direct competition with enterprise players like Monte Carlo
- +Output format (dbt tests, GX suites) creates lock-in through integration, not proprietary platform
- !Feature-not-product risk: Elementary, Soda, or GX could ship this as a feature in a quarter
- !Open-source expectation: data engineers may expect this to be free/OSS and resist paying $99/seat
- !Threshold calibration is deceptively hard — too tight = alert fatigue, too loose = missed issues. Getting this wrong kills trust and adoption
- !Free tier (10 datasets) may be too generous — most small teams have <10 critical datasets
- !Multi-orchestrator support (Airflow, Dagster, Prefect, dbt) could fragment engineering effort before PMF
Open-source Python framework for defining, running, and documenting data quality expectations. GX Cloud adds collaboration and scheduling.
Data observability platform that uses ML to detect anomalies in data freshness, volume, schema, distribution, and lineage. Alerts on unexpected changes.
Data quality platform with SodaCL
Open-source dbt-native data observability. Runs anomaly detection as dbt tests and provides a dashboard for monitoring.
Data diff and regression testing platform. Compares data between environments
dbt-only integration: connect to a dbt project + Snowflake/BigQuery warehouse, profile the last 30 days of run history for selected models, auto-generate a dbt test YAML file with schema tests, accepted_values, not_null rates, row count bounds, and freshness checks with statistically-derived thresholds. Ship as a CLI tool (`autodatatest init → autodatatest generate`) that outputs a `_autodatatest.yml` file you commit to your dbt project. No UI needed for MVP — just clean CLI output and a generated test file.
Free CLI (open-source, 10 models) → $99/seat/mo Pro (unlimited models, Slack/PagerDuty alerts, scheduled re-profiling, drift detection) → $299/seat/mo Team (cross-project coverage, data contracts enforcement, SOC2 audit logs) → Enterprise (SSO, custom integrations, SLAs)
8-12 weeks. Weeks 1-4: build CLI MVP for dbt + Snowflake. Weeks 5-6: dogfood with 3-5 design partners from Reddit/dbt Slack. Weeks 7-8: iterate on threshold accuracy based on feedback. Weeks 9-12: launch on dbt Slack, Reddit r/dataengineering, and HackerNews with free tier. First paying customers likely from design partners converting to Pro within 2-3 months of launch.
- “most teams I talk to only do row counts at best”
- “Writing data tests that catch actual bugs and aren't incredibly noisy is difficult”
- “Lack of skill, domain knowledge”
- “What actually stops people from writing more data tests? Is it time, tooling?”