Running large LLMs like DeepSeek V3 requires expensive multi-GPU setups ($14k/month for 8×H100), but individual developers only use a fraction of the capacity
A self-serve platform where developers form cohorts to split a dedicated GPU node, with built-in fair scheduling, usage metering, anti-abuse controls, and Stripe-based conditional billing that only charges when the cohort fills
Subscription with platform fee (take 10-20% margin on top of compute costs), tiered plans for priority scheduling or guaranteed throughput
The pain is real and quantifiable: running DeepSeek V3 at 8×H100 costs ~$14k/month, but a developer needing 15-25 tok/s only uses maybe 10-20% of capacity. The 43 upvotes and 25 comments on the HN/community post confirm this resonates. Developers are already manually coordinating cost-sharing in Discord groups — a clear sign of unmet demand.
TAM for GPU cloud inference is $10B+ and growing, but the addressable segment for cooperative sharing is narrow: indie devs and small teams who need dedicated (not serverless) inference but can't afford full nodes. This is a wedge market of perhaps $200-500M. Many potential users will graduate to full nodes or settle for serverless APIs, limiting retention.
Developers are already spending $50-500/month on inference APIs. A cohort share of an 8×H100 node at $1.5-3k/month per member (splitting 5-8 ways) hits a viable price point. The Stripe conditional billing (only charge when cohort fills) removes commitment anxiety. The 10-20% platform margin is reasonable and comparable to marketplace takes.
This is significantly harder than it appears. Building fair multi-tenant GPU scheduling with isolation, abuse prevention, and guaranteed throughput SLAs is complex distributed systems work. You need: vLLM or TGI orchestration layer, fair queuing algorithms, usage metering at token/GPU-second granularity, Stripe billing integration, node provisioning automation, and monitoring. A solo dev could build a rough MVP in 8-12 weeks, but production-grade multi-tenant GPU sharing with real isolation is a 6+ month effort.
No existing platform offers cooperative cohort-based GPU sharing with self-serve formation and fair scheduling. This is a genuine whitespace. Vast.ai is closest as a marketplace but lacks the cooperative/pooling model. Together/Fireworks are API-only with no cost-sharing. The gap exists because incumbents optimize for either enterprise (full nodes) or consumer (serverless) — the cooperative middle ground is unserved.
Developers who rely on inference for their products/research will need ongoing access. Monthly cohort subscriptions are natural. Churn risk comes from: members graduating to full nodes, switching to cheaper serverless options as prices drop, or cohort coordination breaking down. Lock-in is moderate — switching cost is reconfiguring API endpoints.
- +Genuine unmet need validated by organic community behavior (developers already pooling informally)
- +Clear pricing arbitrage: fractional dedicated GPU access at 1/5-1/8 the cost of a full node
- +Open-source approach builds trust and community in a market where transparency matters
- +Stripe conditional billing de-risks commitment for early users
- +No direct competitor offers this exact cooperative model
- !GPU pricing is falling fast — the cost-sharing value prop erodes as 8×H100 equivalent gets cheaper, potentially halving within 12-18 months
- !Cohort coordination is fragile: one member leaving can cascade (the 'empty seat' problem), and filling partial cohorts creates a cold-start chicken-and-egg issue
- !Multi-tenant GPU isolation and fair scheduling is genuinely hard — noisy neighbor problems, abuse prevention, and SLA guarantees are complex engineering challenges
- !Incumbents (Together, RunPod) could trivially add a 'share this deployment' feature if the market proves out
- !Open-source core limits monetization — if the orchestration layer is OSS, power users will self-host without the platform fee
Peer-to-peer GPU marketplace where hosts list idle GPUs and renters bid on them. Supports multi-GPU instances for inference and training.
Serverless and dedicated inference API platform. Offers pre-hosted open-source LLMs including DeepSeek models with pay-per-token pricing.
GPU cloud platform offering serverless inference endpoints and on-demand GPU pods for AI workloads. Strong in the indie/hobbyist developer segment.
GPU cloud focused on AI/ML workloads. Offers on-demand and reserved GPU instances, primarily targeting training and inference.
Inference platform specializing in fast, optimized serving of open-source LLMs. Competes with Together.ai on serverless inference.
Single model (DeepSeek V3), single cloud provider (Lambda or RunPod for cheapest H100s), fixed cohort size (5 developers per node). Web dashboard for cohort formation with waitlist. vLLM-based serving with token-bucket fair scheduling. Stripe checkout that activates only when cohort fills. API key per member with usage dashboard. Skip priority tiers, skip multi-model — just prove that 5 strangers can reliably share one 8×H100 node and each get acceptable throughput.
Free open-source orchestrator (self-host) -> Managed platform with 15% margin on compute ($1.5-3k/user/month at 5-8 person cohorts) -> Add priority scheduling tiers ($50-200/month premium) -> Enterprise private cohorts with SLAs -> Expand to training workload sharing -> Eventually broker/marketplace model connecting GPU suppliers with cohort demand
8-12 weeks to MVP with first paying cohort. 3-4 months to validate retention and unit economics. Key milestone: fill 3-5 cohorts organically to prove the coordination model works before investing in growth.
- “Running DeepSeek V3 requires 8×H100 GPUs which is about $14k/month”
- “Most developers only need 15-25 tok/s”
- “pooling money with other developers to collectively use something expensive”