When reverse-engineering legacy systems, understanding the database alone isn't enough — engineers spend weeks reading tangled application code (often in unfamiliar languages) to understand how data flows from ingestion to storage.
Point at a codebase (Java, Python, etc.) and LegacyFlow statically analyzes it to extract all database read/write operations, maps data transformations, and generates visual data flow diagrams showing how messages move from sources (e.g., Kafka) through enrichment logic into destination tables.
Freemium — free for single-repo analysis, paid plans ($99-499/mo) for multi-repo, team features, and integration with migration planning tools.
This is a 'hair on fire' problem. The Reddit post describes 3 weeks of pure code reading — at $150K/yr engineer salary, that's ~$8,600 in direct cost for ONE person on ONE codebase. Multiply across teams doing migrations and it's massive. The pain is visceral: engineers dread this work, it's error-prone, and there's often zero documentation. The fact that even the original maintainer 'doesn't understand it' is extremely common and validates severe pain.
TAM is large ($25B+ modernization market) but LegacyFlow addresses a specific slice: the code-comprehension phase of migration projects. Serviceable market is engineering teams at mid-to-large companies doing active rewrites — estimated 50K-100K such teams globally. At $200/mo average, SAM is ~$120M-240M/yr. Not a trillion-dollar market but very healthy for a startup.
Companies already pay $50K-200K for CAST licenses and $500K+ for consulting firms to do this manually. A $99-499/mo tool that saves even one engineer-week per quarter is a no-brainer ROI. However, individual ICs (who feel the pain most) often can't expense tools easily — you'll need to sell to eng managers or modernization project leads. Budget exists but procurement cycles at enterprises can be slow.
This is the hardest part. Reliable static analysis across multiple languages (Java, Python, etc.) that correctly traces data from Kafka consumers through business logic transformations to DB writes is genuinely difficult. Language-specific parsers, framework-aware analysis (Spring, SQLAlchemy, Hibernate), handling dynamic dispatch, reflection, and metaprogramming — each is a rabbit hole. LLMs can help but hallucinate on complex flows. A solo dev can build a compelling demo for ONE language (e.g., Java + Spring + Kafka) in 6-8 weeks, but multi-language reliability is a multi-year effort. Scope the MVP ruthlessly.
The gap is clear and wide. CAST does something adjacent but is enterprise-priced and enterprise-heavy. No tool today lets an IC engineer point at a messy Java repo and get back 'here are all the Kafka topics consumed, here's how each message type flows through enrichment, and here are the destination tables with column mappings.' This specific workflow — data-engineering-aware code comprehension — is completely unserved at the individual/team level.
Migration projects are inherently time-bounded (3-18 months). Once the legacy system is understood and rewritten, the tool's value drops. Retention risk is real. Mitigations: (1) large orgs have MANY legacy systems queued up, (2) add ongoing 'living documentation' features that track drift, (3) target consulting firms who do this repeatedly. But honest truth: this is more project-based than perpetual SaaS.
- +Extreme pain intensity — engineers viscerally hate this work and waste weeks/months on it
- +Clear competition gap — nothing self-serve exists for data-flow-aware legacy code comprehension
- +Strong willingness to pay at the organizational level — easy ROI story ($500/mo vs $10K+ in engineer time)
- +Tailwind from 'great retirement' of legacy system authors and cloud migration mandates
- +AI/LLM advances make this newly feasible — static analysis + LLM hybrid approach wasn't possible 2 years ago
- !Technical depth required is high — multi-language static analysis that actually works on messy real-world code is extremely hard; half-working analysis is worse than none (generates false confidence)
- !Churn risk — migration projects end, and the tool may not retain customers unless you expand the use case
- !GitHub Copilot / Cursor / AI IDE incumbents could add 'explain this codebase' features that are 'good enough' for many users, even if less specialized
- !Enterprise sales cycles are slow; the people with budget (managers, VPs) are not the people feeling the pain (ICs)
- !Scope creep danger — every legacy codebase is a unique snowflake; customers will demand support for obscure frameworks, languages, and patterns
Behavioral code analysis platform that identifies hotspots, coupling, and technical debt in codebases. Uses git history and code structure to visualize architectural dependencies.
Code intelligence platform with universal code search, cross-repository navigation, and AI-powered code understanding via Cody. Helps developers navigate and understand large codebases.
Enterprise application intelligence platform. CAST Imaging reverse-engineers application source code to create interactive architecture blueprints showing layers, transactions, and data access patterns.
Static analysis and reverse engineering IDE that creates dependency graphs, call trees, control flow diagrams, and metrics for legacy code in 15+ languages.
AI-powered tool that auto-generates and maintains documentation from code, including flow diagrams and explanations of how code modules interact.
Java-only (Spring Boot + Kafka + JDBC/Hibernate). Single repo upload or git URL. Output: (1) list of all Kafka consumers/producers with topic names, (2) list of all DB tables read/written with the SQL operations, (3) visual flow diagram connecting Kafka topics → processing classes → DB tables. Use Tree-sitter for parsing + LLM for semantic understanding of transformation logic. Ship as a web app with GitHub integration. Don't try to support Python, .NET, or other languages in V1.
Free: single-repo, Java-only, basic flow diagram (PDF export). Paid ($99/mo): multi-repo, team sharing, detailed transformation annotations, Confluence/Notion export. Pro ($299/mo): additional language support, CI integration for ongoing tracking, migration planning features (mark flows as 'migrated'). Enterprise ($499+/mo): SSO, on-prem analysis, custom language/framework support, API access.
8-12 weeks to MVP with Java support. First paying customers likely at week 12-16 via direct outreach to engineering managers at companies doing active Java modernization projects. Target companies posting 'legacy migration' job listings or engineering blog posts about rewrites. $1K MRR achievable within 4-5 months if the Java analysis actually works on real-world messy code.
- “I had a task to rewrite very messy java code which read stuff from kafka, enriched them, saved in some tables”
- “It was especially hard since I don't really know java”
- “I just read the code for like 3 weeks”
- “No docs, the maintainer of that old code was very open about not understanding it”