Existing reading apps either lack TTS, have robotic-sounding voices, or require cloud connectivity for decent quality. No good solution exists for offline, natural-sounding read-aloud with synced highlighting on mobile.
An EPUB reader that runs optimized TTS models (like Kokoro) directly on-device using a split-pipeline architecture for CPU efficiency, enabling background audio playback with word-level highlight sync.
Freemium — free with limited daily listening, paid subscription ($4-8/mo) for unlimited TTS, premium voices, and advanced features.
Pain signals are real and specific. Users explicitly asking for Kokoro + EPUB + word sync on Reddit. Audiobook listeners pay $15/book for content they may already own as EPUB. Visually impaired users depend on TTS but current on-device voices are subpar. Commuters need offline. The iOS background audio constraint shows people have tried and hit walls — that is a sign of genuine unmet demand, not hypothetical pain.
Global audiobook market is ~$7B and growing 25%+ YoY. EPUB reader market is mature but large. The intersection — people who own ebooks and want audiobook-quality TTS — is a meaningful slice. TAM for a premium TTS reader is likely $200M-500M if you include accessibility, language learning, and general read-along users. Not a trillion-dollar market, but easily supports a profitable product.
Speechify charges $139/year and has millions of users — proof people pay for TTS reading. Voice Dream at $50/year had loyal paying users before the acquisition debacle. Audible at $15/mo proves people pay for spoken books. At $4-8/mo, this app would be 60-95% cheaper than Speechify and cheaper than a single Audible credit. The value prop — turn any EPUB into an audiobook — directly displaces a $15/book cost with a flat subscription. Strong.
This is the hardest part. Kokoro TTS on-device is proven but integrating it into a production mobile app with EPUB parsing, word-level alignment, background audio, and smooth UX is non-trivial. The iOS background Metal limitation is a real constraint requiring CPU-only inference. EPUB parsing with proper word-boundary detection for highlight sync is fiddly. A solo dev with mobile + ML experience could build a basic MVP in 6-8 weeks, but polish will take longer. Score reflects the iOS background audio constraint specifically — it is solvable but tricky.
No existing product combines all four: high-quality TTS + on-device/offline + proper EPUB rendering + word-level highlight sync. Speechify has quality but is cloud-dependent and expensive. Apple Books is on-device but voices are mediocre. Voice Dream is neglected and cloud-dependent for good voices. Moon+ has great EPUB rendering but terrible TTS integration. The gap is clear and validated by user complaints. Kokoro's existence as a capable on-device model makes this gap newly fillable.
Subscription makes sense: premium voices, unlimited listening time, new voice packs, speed/customization features. But there is a risk — once the model runs on-device, users may feel entitled to a one-time purchase since there is no ongoing cloud cost. Mitigate by gating voice variety, daily listening limits, and advanced features (bookmarks, cross-device sync, reading stats). A hybrid model (one-time unlock + optional premium tier) may actually convert better than pure subscription.
- +Clear market gap — no product combines high-quality on-device TTS with proper EPUB reading and word sync
- +Strong cost advantage over Speechify ($4-8/mo vs $12-24/mo) and Audible ($15/credit)
- +Privacy and offline capability as genuine differentiators in a cloud-dependent competitor landscape
- +Kokoro TTS is open source and proven on-device, dramatically lowering the technical barrier vs. 2 years ago
- +Voice Dream's alienated user base is actively looking for alternatives after the subscription pivot
- +Growing accessibility regulations create institutional demand (schools, libraries, government)
- !iOS background audio with on-device inference is a known hard constraint — must use CPU-only pipeline, not Metal/GPU
- !Apple could ship dramatically better on-device TTS in any iOS update, commoditizing the core feature overnight
- !EPUB parsing + word-level alignment is notoriously edge-case-heavy (complex layouts, footnotes, images, tables)
- !Model size vs. quality tradeoff on older/lower-end devices may disappoint users expecting cloud-level quality
- !App Store review risk — Apple may scrutinize or reject apps running large ML models with high resource usage
VC-funded TTS app that reads EPUBs, PDFs, and web pages with AI-generated voices. Available on iOS, Android, web, and as a browser extension. Word-level highlighting included.
Long-regarded gold standard for TTS reading on iOS. Reads EPUBs, PDFs, and documents with smooth word-level highlighting and synced scrolling. Supports system and third-party voices.
TTS app and web service that reads documents, EPUBs, PDFs, and web pages aloud. Available on web, desktop, iOS, and Android with word highlighting.
Apple's built-in book reader with a Read Aloud feature using on-device Siri neural voices. Supports purchased and sideloaded EPUBs with text highlighting.
Popular Android EPUB/PDF reader with TTS as a secondary feature, delegating to whatever TTS engine is installed on the Android device.
iOS app only. Load EPUB files, render cleanly, run Kokoro TTS via CPU-only ONNX pipeline with word-level timing extraction, highlight words in real-time, support background audio playback. Ship with 2-3 voice options. Free tier: 30 min/day. Paid: unlimited. Skip Android, skip cloud, skip sync. Nail the core loop: open book → tap play → hear a good voice → see words highlight → keep listening in background. That is the entire MVP.
Free (30 min/day, 2 voices) → $5.99/mo or $39.99/yr (unlimited listening, all voices, speed controls, bookmarks) → Premium $8.99/mo (custom voice packs, reading stats, cross-device sync via iCloud). Long-term: voice marketplace where users can import/create voice profiles, institutional licensing for schools and accessibility programs.
8-12 weeks to MVP with TestFlight beta. 12-16 weeks to App Store launch. First paid subscribers within 1-2 weeks of launch if marketed to r/LocalLLaMA, r/audiobooks, r/accessibility, and Hacker News communities. These are technical early adopters who will pay for a good on-device TTS reader. $1K MRR within 2-3 months of launch is realistic with organic distribution alone.
- “I wanted a reading app where you could read, read and listen or just listen to books with word-by-word highlighting synced to TTS”
- “i wanted the voice to actually sound good”
- “iOS kills Metal access the moment you background the app. If your use case needs background audio, this is a dead end”
- “I need a nap that uses Kokoro TTS on Apple”