OpenRouter
Unified API to 400+ LLMs from 70+ providers through one OpenAI-compatible endpoint, with automatic failover and pass-through token pricing
Run and fine-tune thousands of open-source AI models with one line of code via a cloud API, billed per second of GPU or CPU compute
Replicate runs thousands of open-source AI models — image, video, audio, and language — through one API, billing per second of compute. Hardware ranges from CPU at $0.000025/sec to an Nvidia H100 at $0.001525/sec (about $5.49/hour); some public models bill per run, like FLUX 1.1 Pro at $0.04 per image. There is no permanent free tier. Best for developers shipping model-backed features without managing GPUs.
Replicate is a cloud platform for running open-source AI models through a simple API. Instead of provisioning GPUs, installing model weights, and writing serving code, you call a model by name and Replicate handles the infrastructure — scaling up when traffic arrives and down to zero when it stops. The catalog spans image generation (FLUX), video (Wan), audio, and language models, plus thousands of community-contributed versions.
Billing is usage-based and granular. Public models charge for active processing time, often expressed per output or per token — FLUX 1.1 Pro is $0.04 per image, for example. When you deploy your own model, you pay for the hardware by the second across a range of tiers, from CPU at $0.000025/sec to an Nvidia H100 at $0.001525/sec (about $5.49 per hour), with multi-GPU options for heavier work. There is no permanent free tier, and private deployments bill for setup and idle time as well as inference, so costs can accumulate even between requests. Custom models are packaged with Cog, Replicate’s open-source tool, and can be fine-tuned on your own data.
Replicate suits developers who want to ship features backed by open models without becoming infrastructure engineers. It trades the lowest-possible compute cost for not having to manage GPUs, queues, or autoscaling yourself.
Starting price: $0.000025/sec · Free tier: no · Model: usage-based
Price history tracked from June 2026
| Plan | Price | Includes |
|---|---|---|
| CPU (per second) | from $0.000025/sec | CPU Small $0.000025/sec (about $0.09/hr) · CPU Standard $0.000100/sec (about $0.36/hr) · Pay only for active processing time on public models |
| GPU (per second) | $0.000225–$0.003050/sec | Nvidia T4 $0.000225/sec (about $0.81/hr) · L40S $0.000975/sec; A100 80GB $0.001400/sec (about $5.04/hr) · H100 $0.001525/sec (about $5.49/hr); 2x multi-GPU options available |
| Per-run models | per output or token | FLUX 1.1 Pro $0.04 per output image · Video generation (Wan 2.1) about $0.09–$0.25 per second of output · Hosted LLMs billed per million tokens |
| Pros | Cons |
|---|---|
| A single line of code runs a vast catalog of open-source models — no infra to manage | Private models and deployments bill for all online time, including setup and idle, not just active inference |
| Granular per-second billing means you pay only for compute actually used on public models | Cold starts on scaled-to-zero deployments add latency and, for private models, billable setup time |
| Wide hardware range from cheap CPU to multi-GPU H100/A100 clusters | Cost is hard to predict: per-second compute means runtime variance directly changes the bill |
| Cog packaging, fine-tuning, and deployment cover the full custom-model lifecycle | No permanent free tier — only limited free runs on a curated set, then billing is required |
| Transparent published per-second and per-run rates on the pricing page |
Unified API to 400+ LLMs from 70+ providers through one OpenAI-compatible endpoint, with automatic failover and pass-through token pricing
Free, open-source tool to run open-weight LLMs locally via CLI, desktop, or API — with an optional paid cloud for larger models
API-first AI image generator from Black Forest Labs with per-image pay-as-you-go pricing
Open-weight AI image generation model by Stability AI with API and self-hosted deployment options
★ 27.2k as of Jun 22, 2026
AI video generation platform for cinematic clips, world simulation, and conversational video agents
It is pay-as-you-go. Public models bill by active processing time (or per output or token), while hardware is billed per second by tier — from CPU at $0.000025/sec up to multi-GPU H100 at $0.003050/sec.
An A100 with 80GB is $0.001400 per second, equivalent to about $5.04 per hour. A 2x A100 configuration is $0.002800 per second.
Tiers include Nvidia T4 at $0.000225/sec, L40S at $0.000975/sec, A100 80GB at $0.001400/sec, and H100 at $0.001525/sec, plus 2x multi-GPU options for larger workloads.
There is no permanent free tier. New accounts get limited free runs on a curated model set, and after that you must add billing to keep running models.
Yes. It provides an HTTP API with official Python and Node.js client libraries, plus a CLI, and Cog for packaging and deploying custom models.
For public models you only pay for active processing time. But private models and deployments bill for all online time, including setup and idle periods.