On-Device TTS Reading App

The Gap

Existing reading apps either lack TTS, have robotic-sounding voices, or require cloud connectivity for decent quality. No good solution exists for offline, natural-sounding read-aloud with synced highlighting on mobile.

Solution

An EPUB reader that runs optimized TTS models (like Kokoro) directly on-device using a split-pipeline architecture for CPU efficiency, enabling background audio playback with word-level highlight sync.

Revenue Model

Freemium — free with limited daily listening, paid subscription ($4-8/mo) for unlimited TTS, premium voices, and advanced features.

Feasibility Scores

Pain Intensity8/10

Pain signals are real and specific. Users explicitly asking for Kokoro + EPUB + word sync on Reddit. Audiobook listeners pay $15/book for content they may already own as EPUB. Visually impaired users depend on TTS but current on-device voices are subpar. Commuters need offline. The iOS background audio constraint shows people have tried and hit walls — that is a sign of genuine unmet demand, not hypothetical pain.

Market Size7/10

Global audiobook market is ~$7B and growing 25%+ YoY. EPUB reader market is mature but large. The intersection — people who own ebooks and want audiobook-quality TTS — is a meaningful slice. TAM for a premium TTS reader is likely $200M-500M if you include accessibility, language learning, and general read-along users. Not a trillion-dollar market, but easily supports a profitable product.

Willingness to Pay7/10

Speechify charges $139/year and has millions of users — proof people pay for TTS reading. Voice Dream at $50/year had loyal paying users before the acquisition debacle. Audible at $15/mo proves people pay for spoken books. At $4-8/mo, this app would be 60-95% cheaper than Speechify and cheaper than a single Audible credit. The value prop — turn any EPUB into an audiobook — directly displaces a $15/book cost with a flat subscription. Strong.

Technical Feasibility6/10

This is the hardest part. Kokoro TTS on-device is proven but integrating it into a production mobile app with EPUB parsing, word-level alignment, background audio, and smooth UX is non-trivial. The iOS background Metal limitation is a real constraint requiring CPU-only inference. EPUB parsing with proper word-boundary detection for highlight sync is fiddly. A solo dev with mobile + ML experience could build a basic MVP in 6-8 weeks, but polish will take longer. Score reflects the iOS background audio constraint specifically — it is solvable but tricky.

Competition Gap8/10

No existing product combines all four: high-quality TTS + on-device/offline + proper EPUB rendering + word-level highlight sync. Speechify has quality but is cloud-dependent and expensive. Apple Books is on-device but voices are mediocre. Voice Dream is neglected and cloud-dependent for good voices. Moon+ has great EPUB rendering but terrible TTS integration. The gap is clear and validated by user complaints. Kokoro's existence as a capable on-device model makes this gap newly fillable.

Recurring Potential7/10

Subscription makes sense: premium voices, unlimited listening time, new voice packs, speed/customization features. But there is a risk — once the model runs on-device, users may feel entitled to a one-time purchase since there is no ongoing cloud cost. Mitigate by gating voice variety, daily listening limits, and advanced features (bookmarks, cross-device sync, reading stats). A hybrid model (one-time unlock + optional premium tier) may actually convert better than pure subscription.

Strengths

+Clear market gap — no product combines high-quality on-device TTS with proper EPUB reading and word sync
+Strong cost advantage over Speechify ($4-8/mo vs $12-24/mo) and Audible ($15/credit)
+Privacy and offline capability as genuine differentiators in a cloud-dependent competitor landscape
+Kokoro TTS is open source and proven on-device, dramatically lowering the technical barrier vs. 2 years ago
+Voice Dream's alienated user base is actively looking for alternatives after the subscription pivot
+Growing accessibility regulations create institutional demand (schools, libraries, government)

Risks

!iOS background audio with on-device inference is a known hard constraint — must use CPU-only pipeline, not Metal/GPU
!Apple could ship dramatically better on-device TTS in any iOS update, commoditizing the core feature overnight
!EPUB parsing + word-level alignment is notoriously edge-case-heavy (complex layouts, footnotes, images, tables)
!Model size vs. quality tradeoff on older/lower-end devices may disappoint users expecting cloud-level quality
!App Store review risk — Apple may scrutinize or reject apps running large ML models with high resource usage

Competition

Speechify

VC-funded TTS app that reads EPUBs, PDFs, and web pages with AI-generated voices. Available on iOS, Android, web, and as a browser extension. Word-level highlighting included.

Pricing: Free tier (limited

Gap: Cloud-dependent for quality voices (useless offline), extremely expensive, aggressive upselling and dark patterns, battery drain from streaming, privacy concerns (all content sent to servers), EPUB rendering is mediocre compared to dedicated readers

Voice Dream Reader

Long-regarded gold standard for TTS reading on iOS. Reads EPUBs, PDFs, and documents with smooth word-level highlighting and synced scrolling. Supports system and third-party voices.

Pricing: Shifted from $14.99 one-time to ~$4.99/mo or $49.99/year after acquisition (controversial

Gap: iOS only, premium voices still cloud-dependent, app neglected after ownership change, UI feels dated, subscription pivot alienated core users — there is real user anger here creating an opening

NaturalReader

TTS app and web service that reads documents, EPUBs, PDFs, and web pages aloud. Available on web, desktop, iOS, and Android with word highlighting.

Pricing: Free tier (basic voices

Gap: High-quality voices are cloud-only, EPUB rendering is basic compared to dedicated e-readers, no offline AI voice capability, subscription cost adds up, reading experience feels secondary to the TTS feature

Apple Books (Read Aloud)

Apple's built-in book reader with a Read Aloud feature using on-device Siri neural voices. Supports purchased and sideloaded EPUBs with text highlighting.

Pricing: Free (built into iOS/macOS

Gap: Voice quality is 'good enough' but noticeably synthetic — nowhere near Kokoro quality, very limited voice selection and customization, no speed/pitch fine-tuning, Apple ecosystem lock-in, Read Aloud not available for all books, highlighting is sentence-level not precise word-level

Moon+ Reader Pro

Popular Android EPUB/PDF reader with TTS as a secondary feature, delegating to whatever TTS engine is installed on the Android device.

Pricing: Free with ads; Pro version ~$6.99 one-time

Gap: TTS is an afterthought — sentence-level highlighting only, no word sync, voice quality entirely depends on system TTS engine, no built-in high-quality voices, TTS controls are basic, zero innovation on the audio side

MVP Suggestion

iOS app only. Load EPUB files, render cleanly, run Kokoro TTS via CPU-only ONNX pipeline with word-level timing extraction, highlight words in real-time, support background audio playback. Ship with 2-3 voice options. Free tier: 30 min/day. Paid: unlimited. Skip Android, skip cloud, skip sync. Nail the core loop: open book → tap play → hear a good voice → see words highlight → keep listening in background. That is the entire MVP.

Monetization Path

Free (30 min/day, 2 voices) → $5.99/mo or $39.99/yr (unlimited listening, all voices, speed controls, bookmarks) → Premium $8.99/mo (custom voice packs, reading stats, cross-device sync via iCloud). Long-term: voice marketplace where users can import/create voice profiles, institutional licensing for schools and accessibility programs.

Time to Revenue

8-12 weeks to MVP with TestFlight beta. 12-16 weeks to App Store launch. First paid subscribers within 1-2 weeks of launch if marketed to r/LocalLLaMA, r/audiobooks, r/accessibility, and Hacker News communities. These are technical early adopters who will pay for a good on-device TTS reader. $1K MRR within 2-3 months of launch is realistic with organic distribution alone.

What people are saying

“I wanted a reading app where you could read, read and listen or just listen to books with word-by-word highlighting synced to TTS”
“i wanted the voice to actually sound good”
“iOS kills Metal access the moment you background the app. If your use case needs background audio, this is a dead end”
“I need a nap that uses Kokoro TTS on Apple”

On-Device TTS Reading App

More in Education

IEP-AI Adapt

ClassTrack

AI Immigration Interview Prep

PhoneLock Classroom