6.3mediumCONDITIONAL GO

On-Device LLM Compatibility Layer

Middleware that abstracts away mobile GPU/NPU quirks for reliable on-device LLM inference

DevToolsMobile app developers building AI-powered features that need to run on-device...
The Gap

Mobile LLM inference fails unpredictably across different hardware — GPU delegation crashes, models produce garbage output, and developers have no easy way to detect and route around hardware-specific issues

Solution

An SDK/library that profiles the device hardware at runtime, selects the optimal execution backend (GPU, NPU, CPU), handles fallbacks gracefully, and provides a consistent API for app developers integrating on-device LLMs

Revenue Model

Subscription SDK license — free tier for indie devs, paid tiers ($50-200/mo) for commercial use with SLA and support

Feasibility Scores
Pain Intensity8/10

The pain is real, documented, and growing. Android hardware fragmentation for ML inference is the #1 cited developer challenge. 24,000+ device models, wildly inconsistent GPU/NPU drivers, NNAPI's effective failure as an abstraction layer, and 'works on Pixel, crashes on Samsung' are well-known problems. The Reddit thread and Google Edge Gallery breakage are symptoms of a structural issue. However, it's currently felt by a relatively small (but growing) population of mobile ML developers.

Market Size6/10

The addressable market today is small — maybe 50K-100K mobile developers actively deploying on-device LLMs. At $50-200/mo that's a $30M-240M TAM. However, the market is expanding rapidly as on-device AI becomes table stakes for mobile apps (Samsung, Google, Apple all pushing it). In 2-3 years this could be 500K+ developers. The risk is that large platform players (Google, Meta) build 'good enough' solutions that cap the ceiling.

Willingness to Pay5/10

Mixed signals. Enterprise mobile teams (fintech, healthcare, automotive) would pay $200+/mo without blinking for reliable cross-device inference — these are teams spending $50K+/yr on device testing already. But the bulk of the current on-device LLM community is hobbyist/indie developers from r/LocalLLaMA who expect everything to be free/open-source. The SDK market has a strong FOSS expectation. Need to find the enterprise wedge early.

Technical Feasibility4/10

This is the hardest part. Building a reliable abstraction over QNN, Samsung ONE, MediaTek NeuroPilot, Vulkan, NNAPI, and XNNPACK requires deep systems programming across multiple vendor SDKs, many of which have poor documentation and licensing constraints. The device capability database alone (profiling thousands of devices) is a massive ongoing effort. A solo dev could build a proof-of-concept covering Snapdragon + CPU fallback in 8 weeks, but a production-grade cross-vendor solution is 6-12 months of focused work minimum. Vendor SDK licensing and distribution rights need legal review.

Competition Gap8/10

The gap is clear and validated. Every existing solution is either vendor-locked (QNN, Samsung ONE), GPU-only without NPU (MLC LLM, MediaPipe), or provides raw building blocks without the intelligence layer (ExecuTorch, ONNX Runtime). Nobody does automatic hardware detection + optimal backend selection + graceful fallback. NNAPI tried to be this layer and failed. The gap exists because it's genuinely hard to build, which is also why it's defensible.

Recurring Potential7/10

Strong subscription fit. New devices launch constantly (device database updates), vendor SDKs change (maintenance), new models need profiling (ongoing value), and SLA/support for production apps justify monthly fees. The device compatibility database is inherently a living product. Risk: open-source pressure could force the core SDK to be free, pushing revenue to support/enterprise tiers only.

Strengths
  • +Clear, validated gap — no existing solution provides intelligent cross-vendor hardware abstraction with automatic fallback for mobile LLM inference
  • +Strong structural moat — the device capability database and vendor SDK integration knowledge compound over time and are extremely hard to replicate
  • +Perfect market timing — on-device LLMs just crossed the usability threshold while the middleware layer remains completely unowned
  • +Recurring revenue natural fit — constant device launches, SDK changes, and new model architectures create ongoing value
Risks
  • !Platform risk: Google could ship this as part of MediaPipe or Android Jetpack, instantly making it free and first-party. Google has both the motivation (Android ecosystem health) and resources, though their NNAPI track record is poor
  • !Technical depth: Building reliable abstractions over poorly-documented, frequently-changing vendor SDKs (QNN, Samsung ONE, NeuroPilot) requires deep systems expertise and ongoing reverse-engineering. Vendor cooperation is not guaranteed
  • !Open-source pressure: The on-device LLM community has strong FOSS expectations. Competitors like MLC LLM and ExecuTorch are free. Monetizing may require going enterprise-first rather than developer-community-first
  • !Small current market: The number of developers shipping production on-device LLM features today is still small. You're betting on growth — if on-device LLMs don't become mainstream (e.g., cloud inference gets cheap enough), the market may not materialize
Competition
Google MediaPipe LLM Inference API

High-level API for running LLMs

Pricing: Free / open-source (Apache 2.0
Gap: No NPU access at all, no automatic GPU-to-CPU fallback, no device capability detection, no performance normalization across 24,000+ Android device models. If GPU delegate crashes, the developer is on their own.
ExecuTorch (Meta/PyTorch)

Meta's mobile inference runtime with a delegate architecture supporting XNNPACK

Pricing: Free / open-source (BSD license
Gap: Delegates are opt-in and manual — no automatic hardware detection, no fallback chain orchestration, no device capability database. Using QNN delegate still requires Snapdragon-specific expertise. Developer must hand-code the routing logic this idea automates.
MLC LLM

Open-source TVM-based compiler that compiles LLMs for diverse GPU backends

Pricing: Free / open-source (Apache 2.0
Gap: GPU-only — zero NPU access on any platform. Vulkan driver quality varies wildly across Android devices causing silent crashes. No automatic fallback, no device profiling, complex TVM compilation pipeline intimidates app developers.
Qualcomm AI Engine Direct (QNN SDK)

Qualcomm's proprietary SDK for running AI workloads directly on Snapdragon NPU

Pricing: Free for Qualcomm chipset developers
Gap: Qualcomm-only — useless on ~60% of Android devices (Samsung Exynos, MediaTek Dimensity, Google Tensor). Complex integration, no LLM-specific abstractions (no KV-cache management, tokenization, or sampling). Different Snapdragon generations have different capabilities with no smoothing layer.
ONNX Runtime Mobile

Microsoft's cross-platform inference engine with mobile optimization. Uses Execution Providers

Pricing: Free / open-source (MIT license
Gap: Relies on Android NNAPI for hardware acceleration — which is notoriously inconsistent (ops silently fall back to CPU, vendor drivers are buggy). No intelligent fallback, no device capability detection ('can this phone run 3B params?'), 20-40MB binary size overhead, LLM-specific GenAI extension still immature on mobile.
MVP Suggestion

Android SDK covering Qualcomm Snapdragon (via QNN) + Vulkan GPU + CPU fallback. Three features only: (1) runtime hardware detection that returns a device capability profile, (2) automatic backend selection with fallback chain (NPU → GPU → CPU), (3) a simple inference API that wraps the complexity. Support 2-3 popular model architectures (Llama, Gemma, Phi). Target 10 popular devices for the initial compatibility matrix. Ship as a Gradle dependency. Skip iOS, skip Samsung NPU, skip MediaTek — nail Snapdragon + fallback first.

Monetization Path

Free open-source core SDK with basic CPU/GPU inference → Paid tier ($50/mo) adds NPU acceleration, device capability database API, and model-to-device recommendations → Enterprise tier ($200/mo) adds SLA, priority support, custom device profiling, and on-prem compatibility testing dashboard → Scale via per-inference metering for high-volume apps or annual enterprise contracts

Time to Revenue

4-6 months to first paying customer. 8 weeks for MVP (Snapdragon + CPU fallback), 4 weeks for beta testing with 5-10 early adopter teams from Android/ML communities, then launch paid tier. Enterprise deals (the real money) likely take 6-9 months from first contact due to procurement cycles. Expect $1K-5K MRR within 6 months if execution is strong.

What people are saying
  • your hardware ain't enjoying nuttin of that business
  • you have to switch from GPU to CPU
  • Edge Gallery has been broken since launch