Replicate

Run and fine-tune thousands of open-source AI models with one line of code via a cloud API, billed per second of GPU or CPU compute

Replicate runs thousands of open-source AI models — image, video, audio, and language — through one API, billing per second of compute. Hardware ranges from CPU at $0.000025/sec to an Nvidia H100 at $0.001525/sec (about $5.49/hour); some public models bill per run, like FLUX 1.1 Pro at $0.04 per image. There is no permanent free tier. Best for developers shipping model-backed features without managing GPUs.

Verified JUN 23, 2026 USAGE-BASED Live

Visit replicate.com ↗ AI Image Generation AI Coding AI Agents & Automation

What is Replicate?

Replicate is a cloud platform for running open-source AI models through a simple API. Instead of provisioning GPUs, installing model weights, and writing serving code, you call a model by name and Replicate handles the infrastructure — scaling up when traffic arrives and down to zero when it stops. The catalog spans image generation (FLUX), video (Wan), audio, and language models, plus thousands of community-contributed versions.

Billing is usage-based and granular. Public models charge for active processing time, often expressed per output or per token — FLUX 1.1 Pro is $0.04 per image, for example. When you deploy your own model, you pay for the hardware by the second across a range of tiers, from CPU at $0.000025/sec to an Nvidia H100 at $0.001525/sec (about $5.49 per hour), with multi-GPU options for heavier work. There is no permanent free tier, and private deployments bill for setup and idle time as well as inference, so costs can accumulate even between requests. Custom models are packaged with Cog, Replicate’s open-source tool, and can be fine-tuned on your own data.

Who is it for?

Replicate suits developers who want to ship features backed by open models without becoming infrastructure engineers. It trades the lowest-possible compute cost for not having to manage GPUs, queues, or autoscaling yourself.

Product developers adding image, video, audio, or language generation to an app via a single API call.
ML practitioners who fine-tune open models and want a fast path to a deployed, autoscaling endpoint using Cog.
Startups and prototypers that need to test many models quickly and pay only for what they actually run.
Teams avoiding GPU ops who accept per-second pricing and occasional cold starts in exchange for zero infrastructure management.

How much does Replicate cost?

Starting price: $0.000025/sec · Free tier: no · Model: usage-based

Pricing verified JUN 23, 2026

Price history tracked from June 2026

Replicate pricing tiers, verified against the official pricing page
Plan	Price	Includes
CPU (per second)	from $0.000025/sec	CPU Small $0.000025/sec (about $0.09/hr) · CPU Standard $0.000100/sec (about $0.36/hr) · Pay only for active processing time on public models
GPU (per second)	$0.000225–$0.003050/sec	Nvidia T4 $0.000225/sec (about $0.81/hr) · L40S $0.000975/sec; A100 80GB $0.001400/sec (about $5.04/hr) · H100 $0.001525/sec (about $5.49/hr); 2x multi-GPU options available
Per-run models	per output or token	FLUX 1.1 Pro $0.04 per output image · Video generation (Wan 2.1) about $0.09–$0.25 per second of output · Hosted LLMs billed per million tokens

What are Replicate's key features?

Run thousands of open-source models via a single API call
Per-second hardware billing across CPU and GPU tiers (T4, L40S, A100, H100)
Fine-tune models on custom datasets to create specialized versions
Deploy custom models with Cog, Replicate's open-source packaging tool
Automatic scaling from zero to high traffic, with no charge during idle on public models
Client libraries for Python and Node.js plus a raw HTTP API and CLI
Logging and monitoring to track predictions and model performance

What people use Replicate for

01 Generating images via models like FLUX 1.1 Pro through an API
02 Text-to-video and image animation generation, such as Wan 2.1
03 Running hosted LLMs without managing GPU infrastructure
04 Fine-tuning and deploying a custom production model with autoscaling
05 Text-to-speech, music generation, and image restoration tasks

Pros and cons

Pros and cons of Replicate
Pros	Cons
A single line of code runs a vast catalog of open-source models — no infra to manage	Private models and deployments bill for all online time, including setup and idle, not just active inference
Granular per-second billing means you pay only for compute actually used on public models	Cold starts on scaled-to-zero deployments add latency and, for private models, billable setup time
Wide hardware range from cheap CPU to multi-GPU H100/A100 clusters	Cost is hard to predict: per-second compute means runtime variance directly changes the bill
Cog packaging, fine-tuning, and deployment cover the full custom-model lifecycle	No permanent free tier — only limited free runs on a curated set, then billing is required
Transparent published per-second and per-run rates on the pricing page

What are the best Replicate alternatives?

OpenRouter

Unified API to 400+ LLMs from 70+ providers through one OpenAI-compatible endpoint, with automatic failover and pass-through token pricing

USAGE-BASED Verified JUN 23, 2026

Ollama

Free, open-source tool to run open-weight LLMs locally via CLI, desktop, or API — with an optional paid cloud for larger models

FREEMIUM Verified JUN 23, 2026

FLUX

API-first AI image generator from Black Forest Labs with per-image pay-as-you-go pricing

USAGE-BASED Verified JUN 11, 2026

Stable Diffusion

Open-weight AI image generation model by Stability AI with API and self-hosted deployment options

FREEMIUM Verified JUN 11, 2026

★ 27.2k as of Jun 22, 2026

Runway

AI video generation platform for cinematic clips, world simulation, and conversational video agents

FREEMIUM Verified JUN 11, 2026

How people make money with Replicate

Sell a generative-media API product — image, video, or audio — built on Replicate models, pricing each request above its per-second compute cost to capture margin
Package a fine-tuned, niche model (brand-style images, voice clone, domain LLM) as a paid endpoint, deploying it with Cog and charging per generation

Frequently asked questions

How does Replicate pricing work?

It is pay-as-you-go. Public models bill by active processing time (or per output or token), while hardware is billed per second by tier — from CPU at $0.000025/sec up to multi-GPU H100 at $0.003050/sec.

How much does an Nvidia A100 cost on Replicate?

An A100 with 80GB is $0.001400 per second, equivalent to about $5.04 per hour. A 2x A100 configuration is $0.002800 per second.

What GPUs does Replicate offer?

Tiers include Nvidia T4 at $0.000225/sec, L40S at $0.000975/sec, A100 80GB at $0.001400/sec, and H100 at $0.001525/sec, plus 2x multi-GPU options for larger workloads.

Is there a free tier?

There is no permanent free tier. New accounts get limited free runs on a curated model set, and after that you must add billing to keep running models.

Does Replicate have an API and CLI?

Yes. It provides an HTTP API with official Python and Node.js client libraries, plus a CLI, and Cog for packaging and deploying custom models.

Do I pay when my model is idle?

For public models you only pay for active processing time. But private models and deployments bill for all online time, including setup and idle periods.