Informatica and similar enterprise ETL tools are extremely expensive and make trivial tasks overly complicated with visual block-based interfaces.
A lightweight, code-first ETL framework with a clean UI that generates readable Python/SQL under the hood. Engineers maintain coding skills while getting productivity benefits of visual orchestration. Fraction of the cost of Informatica.
Subscription — free open-source core, $50-200/month per team for managed cloud version with monitoring, scheduling, and collaboration features
Real pain exists — Informatica costs $100K+/year and frustrates engineers with its visual-block paradigm. But most teams have already found workarounds (Airflow + dbt + Airbyte). The pain is strongest for mid-market teams currently stuck on enterprise tools they inherited, not greenfield teams who already use OSS.
TAM is large ($15B+ data integration market), but LightETL targets the underserved mid-market: teams of 2-15 data engineers. SAM is probably $500M-1B. At $50-200/team/month, you need thousands of teams to reach meaningful revenue. Enough room for a solid business, not a unicorn.
This is the weakest link. Data engineers are accustomed to free OSS tools (Airflow, dbt Core, Singer). The $50-200/mo price point competes with 'free + my weekend.' Enterprise buyers who actually spend money want SOC2, SSO, audit logs — features that take time to build. The Reddit thread signals frustration but not 'I'd pay for an alternative' — they just want less Informatica.
A solo dev can build a basic code-generation ETL framework with a web UI in 6-8 weeks. The core (Python/SQL generation, DAG execution, basic scheduling) is well-understood. BUT: building reliable connectors is a grind (each source is a snowflake), and the managed cloud version (auth, multi-tenancy, monitoring) adds months. MVP is feasible; production-grade is hard.
The gap is narrower than it appears. Dagster and Prefect are already code-first. dbt handles SQL transforms. Airbyte handles connectors. The 'unified code-first ETL with visible code' niche exists but Dagster is rapidly filling it. Your true differentiator is simplicity + affordability, which is hard to defend as OSS tools are already free. You'd be competing on UX polish and ease-of-use.
Strong recurring potential. ETL is inherently ongoing — pipelines run daily/hourly forever. Once teams depend on your scheduling, monitoring, and alerting, switching costs are high. Cloud-hosted version with collaboration features is a natural subscription. Data infrastructure has excellent retention rates.
- +Real, validated pain with enterprise ETL pricing and complexity — Informatica hate is widespread
- +Code-first with visible generated code is a genuine differentiator vs visual-only tools
- +ETL is sticky infrastructure with strong recurring revenue potential and high switching costs
- +Fragmented modern data stack (3-5 tools stitched together) creates demand for a simpler unified solution
- +Price point ($50-200/mo) is in a sweet spot below enterprise but above hobbyist
- !Dagster is already winning the 'code-first orchestration' narrative with strong VC backing ($50M+) — you're entering their territory
- !Willingness to pay is unproven: data engineers default to free OSS and resist paying for tools they could build themselves
- !Connector maintenance is a grind that killed Meltano's momentum — each data source is an ongoing maintenance burden
- !The 'generates readable code' value prop may not matter enough: engineers who want code already write code; engineers who want UI don't care about generated code
- !Solo founder building infra tooling for data teams is a long sales cycle with low initial conversion
Open-source data orchestrator with a code-first, asset-based approach. Generates a UI from Python code. Strong typing and testability built in.
Python-native workflow orchestration. Decorate existing Python functions to turn them into observable, retryable flows. Lightweight and flexible.
Open-source EL
Open-source, CLI-first DataOps platform built on Singer taps/targets. Handles EL with plugin-based architecture. GitOps-friendly.
SQL-first transformation layer that runs inside the warehouse. Defines transformations as SELECT statements with version control, testing, and documentation.
Open-source Python framework that lets you define ETL pipelines in simple Python/SQL, auto-generates clean readable code, and provides a local web UI showing DAG visualization + run history + logs. Start with 5-10 connectors (Postgres, MySQL, S3, REST APIs, CSV). Skip the cloud version entirely for MVP — focus on making the local experience magical. Think 'Flask for ETL' — minimal, obvious, delightful.
Free OSS core → build community and GitHub stars → launch hosted cloud version ($50/mo starter) with scheduling, monitoring, alerts, team collaboration → add enterprise tier ($200+/mo) with SSO, audit logs, role-based access → eventually offer managed connectors marketplace where community contributes connectors for rev share
6-9 months. First 8-10 weeks building OSS MVP, 2-3 months building community and getting feedback, then launch cloud beta. First paying customers likely at month 6-9. Reaching $10K MRR could take 12-18 months given the long adoption cycle for infrastructure tools.
- “it costs out the ass”
- “it renders some trivial tasks waaay more complicated than they should be”
- “Do engineers still need to use SQL/Python when using Informatica?”