RightSize Data Infra

The Gap

Companies default to expensive big data tools (Spark, Databricks, large warehouse tiers) when their workloads could run on Postgres or SQLite. They overspend on infrastructure because of hype, not actual need.

Solution

Analyzes your current data pipeline workloads — query complexity, data volumes, processing patterns — and recommends the simplest (cheapest) infrastructure that can handle it. Shows projected cost savings from downgrading. Provides a migration playbook if applicable.

Revenue Model

One-time audit fee ($2K-10K) or annual subscription ($500/mo) for continuous right-sizing recommendations as workloads change

Feasibility Scores

Pain Intensity7/10

The pain is real but often invisible. Companies spending $5K-50K/mo on data infra feel the cost but rarely question the architecture — it's a slow bleed, not an acute crisis. The Reddit thread and 'Big Data is Dead' discourse confirm widespread agreement that over-provisioning is epidemic. However, the people who feel the pain most (finance teams) aren't the ones who chose the stack (data engineers), creating a political gap.

Market Size6/10

TAM is every company spending >$5K/mo on data infrastructure that doesn't need to — likely tens of thousands of companies, potentially a $500M-$1B addressable market for audit/advisory services. But the realistic serviceable market is smaller: you need companies that (a) are overspending, (b) know or suspect they're overspending, AND (c) are willing to act on it. Many data teams resist downgrading due to ego, career incentives, or genuine uncertainty.

Willingness to Pay7/10

Strong ROI story — if you save a company $100K/yr by moving from Databricks to Postgres, charging $5K-$10K for the audit is an easy sell. One-time audit model maps well to consulting buyers. The challenge is the subscription model: once you've right-sized, what ongoing value justifies $500/mo? Continuous monitoring is a weaker value prop than the initial audit.

Technical Feasibility5/10

This is harder than it looks. To credibly recommend 'switch from Spark to Postgres,' you need to (a) connect to and analyze actual query logs, execution plans, and data volumes across multiple platforms (Databricks, Snowflake, BigQuery, Redshift — each with different APIs), (b) model whether those workloads would perform acceptably on simpler alternatives, and (c) estimate migration effort. The analysis engine — especially modeling Postgres performance for workloads currently running on Spark — requires deep domain expertise. A solo dev with strong data engineering background could build a credible MVP for ONE platform (e.g., Databricks-only audit) in 6-8 weeks, but multi-platform support is a 3-6 month project.

Competition Gap9/10

This is the strongest signal. ZERO existing tools recommend architectural downgrading. Every incumbent (Vantage, Unravel, CloudZero) is financially incentivized to keep customers on expensive platforms — they make money from integrations with those platforms. Databricks and Snowflake will never build this. The recommendation to simplify is inherently adversarial to the ecosystem, which means incumbents can't easily copy it. This is a genuine structural gap.

Recurring Potential5/10

The initial audit is naturally a one-time or annual engagement ($2K-$10K). Continuous monitoring ('your workload grew, now you DO need Spark' or 'your workload shrank, time to downgrade') is a valid subscription concept but harder to justify monthly — workloads don't change that fast. Risk of being a great consulting business that struggles to become a SaaS business. The migration playbook and ongoing cost benchmarking could sustain a subscription, but it's a stretch at $500/mo.

Strengths

+Massive, validated structural gap — no tool recommends downgrading, and incumbents are financially incentivized never to build this
+Perfect cultural timing: 'Big Data is Dead,' DuckDB movement, MDS backlash, and Snowflake cost horror stories are mainstream
+Incredibly strong ROI story for buyers — $5K audit that saves $100K/yr sells itself
+Adversarial positioning creates a natural moat — Databricks, Snowflake, and their ecosystem partners cannot copy this
+Multiple monetization paths: one-time audit, retainer, migration services, and potential SaaS

Risks

!Political resistance: you're telling data engineers 'you over-engineered this,' which feels like a personal attack on their technical judgment and can kill deals
!Technical depth required: credibly modeling 'can Postgres handle this Spark workload?' demands deep expertise across multiple platforms — easy to get wrong and destroy credibility
!Consulting trap: the highest-value motion is one-time audits, which don't scale like SaaS and create feast-or-famine revenue
!Narrow action window: once a company is deeply invested in a stack (years of Spark jobs, trained team), the switching cost makes recommendations feel impractical regardless of merit
!Low engagement signal: 11 upvotes / 54 comments on one Reddit thread is anecdotal, not market validation

Competition

Vantage

Cloud cost visibility and optimization platform. Connects to AWS, Azure, GCP, Snowflake, Databricks. Provides cost reporting, budgeting, anomaly detection, and right-sizing recommendations within existing infrastructure.

Pricing: Free for <$2.5K/mo cloud spend, then scales ~1-2% of managed spend. Enterprise custom.

Gap: Optimizes costs WITHIN your existing stack only. Will tell you to resize Databricks clusters but will NEVER recommend 'switch to Postgres.' No architectural simplification recommendations.

Unravel Data

Full-stack data observability and optimization for Databricks, Snowflake, BigQuery, EMR/Spark. Analyzes query performance, resource utilization, provides AI-driven tuning recommendations.

Pricing: Custom enterprise pricing, typically $50K-$200K+/year. Targets large data teams.

Gap: CLOSEST competitor but with a fatal blind spot: optimizes within Databricks/Spark, never recommends leaving it. Their business model depends on customers staying on big data platforms. Will never say 'you don't need Spark.'

CloudZero

Cloud cost intelligence platform focused on unit economics — maps cloud spend to features, products, teams, and customers. Answers 'what does it cost to serve Customer X?'

Pricing: Custom enterprise pricing, reportedly starting $2,500-$5,000/month minimum.

Gap: Purely a cost attribution/visibility tool. Tells you WHAT costs money, not WHETHER you need it. No architectural recommendations whatsoever. Will never suggest downgrading your stack.

Select Star

Automated data discovery and lineage platform. Maps how data flows through your org — which tables are used, by whom, which queries hit which tables, column-level lineage.

Pricing: Free tier available. Paid plans ~$500-$1,000/mo. Enterprise custom.

Gap: Can reveal that 80% of your tables are unused, but stops there. Focuses on data governance/discovery, not infrastructure recommendations. Never connects usage patterns to 'you should switch platforms.'

Antimetal

Automated AWS savings through group buying of Reserved Instances and Savings Plans. Pools demand across customers for better rates. Autonomous cost optimization.

Pricing: Success-based — takes ~25% of savings generated. No upfront cost.

Gap: AWS only. Purely financial optimization (buying commitments cheaper). Does not analyze whether you need the infrastructure at all. Cannot evaluate data workloads or recommend simplification.

MVP Suggestion

Start with a single-platform focus: Databricks-only audit. Connect to Databricks workspace via API, pull query logs and cluster usage data for the last 90 days, generate a report showing: (1) percentage of queries that are simple SQL (no distributed compute needed), (2) max data volume actually processed per query, (3) peak concurrency, (4) estimated Postgres/DuckDB equivalent performance, (5) projected annual savings from migrating. Deliver as a PDF report with a 30-minute walkthrough call. Skip the SaaS platform entirely — start as a productized consulting service with a lightweight analysis script behind the scenes.

Monetization Path

Phase 1: Productized audit at $2K-$5K (manual analysis + script, target 2-3 clients/month). Phase 2: Build self-service analysis tool, charge $5K-$10K for automated audit with migration playbook. Phase 3: Annual subscription ($500/mo) for continuous workload monitoring and right-sizing alerts as data volumes change. Phase 4: Migration-as-a-service partnerships with consultancies, taking a referral fee when clients actually execute the migration.

Time to Revenue

4-6 weeks to first dollar if positioned as productized consulting (manual audit with script assistance). 3-4 months if building a self-service tool first. Recommend the consulting-first approach — you'll learn what customers actually need, build credibility with case studies, and generate revenue while building the product.

What people are saying

“everyone else is just paying for Spark clusters to run queries that Postgres could handle”
“clean, reliable, actionable data”
“TBs of noise or a few GBs of insights that actually drive decisions”

RightSize Data Infra

More in DevTools

Contractor Digital Presence Autopilot

Proxmox Managed Support (North America)

LegalLLM Setup-as-a-Service

AI-Proof Technical Interview Platform