Skip to content
AITrendTool

Replicate

Run and fine-tune thousands of open-source AI models with one line of code via a cloud API, billed per second of GPU or CPU compute

Replicate runs thousands of open-source AI models — image, video, audio, and language — through one API, billing per second of compute. Hardware ranges from CPU at $0.000025/sec to an Nvidia H100 at $0.001525/sec (about $5.49/hour); some public models bill per run, like FLUX 1.1 Pro at $0.04 per image. There is no permanent free tier. Best for developers shipping model-backed features without managing GPUs.

Verified JUN 23, 2026 USAGE-BASED Live
Screenshot of Replicate

What is Replicate?

Replicate is a cloud platform for running open-source AI models through a simple API. Instead of provisioning GPUs, installing model weights, and writing serving code, you call a model by name and Replicate handles the infrastructure — scaling up when traffic arrives and down to zero when it stops. The catalog spans image generation (FLUX), video (Wan), audio, and language models, plus thousands of community-contributed versions.

Billing is usage-based and granular. Public models charge for active processing time, often expressed per output or per token — FLUX 1.1 Pro is $0.04 per image, for example. When you deploy your own model, you pay for the hardware by the second across a range of tiers, from CPU at $0.000025/sec to an Nvidia H100 at $0.001525/sec (about $5.49 per hour), with multi-GPU options for heavier work. There is no permanent free tier, and private deployments bill for setup and idle time as well as inference, so costs can accumulate even between requests. Custom models are packaged with Cog, Replicate’s open-source tool, and can be fine-tuned on your own data.

Who is it for?

Replicate suits developers who want to ship features backed by open models without becoming infrastructure engineers. It trades the lowest-possible compute cost for not having to manage GPUs, queues, or autoscaling yourself.

  • Product developers adding image, video, audio, or language generation to an app via a single API call.
  • ML practitioners who fine-tune open models and want a fast path to a deployed, autoscaling endpoint using Cog.
  • Startups and prototypers that need to test many models quickly and pay only for what they actually run.
  • Teams avoiding GPU ops who accept per-second pricing and occasional cold starts in exchange for zero infrastructure management.

How much does Replicate cost?

Starting price: $0.000025/sec · Free tier: no · Model: usage-based

Pricing verified JUN 23, 2026

Price history tracked from June 2026

Replicate pricing tiers, verified against the official pricing page
Plan Price Includes
CPU (per second) from $0.000025/sec CPU Small $0.000025/sec (about $0.09/hr) · CPU Standard $0.000100/sec (about $0.36/hr) · Pay only for active processing time on public models
GPU (per second) $0.000225–$0.003050/sec Nvidia T4 $0.000225/sec (about $0.81/hr) · L40S $0.000975/sec; A100 80GB $0.001400/sec (about $5.04/hr) · H100 $0.001525/sec (about $5.49/hr); 2x multi-GPU options available
Per-run models per output or token FLUX 1.1 Pro $0.04 per output image · Video generation (Wan 2.1) about $0.09–$0.25 per second of output · Hosted LLMs billed per million tokens

What are Replicate's key features?

  • Run thousands of open-source models via a single API call
  • Per-second hardware billing across CPU and GPU tiers (T4, L40S, A100, H100)
  • Fine-tune models on custom datasets to create specialized versions
  • Deploy custom models with Cog, Replicate's open-source packaging tool
  • Automatic scaling from zero to high traffic, with no charge during idle on public models
  • Client libraries for Python and Node.js plus a raw HTTP API and CLI
  • Logging and monitoring to track predictions and model performance

What people use Replicate for

  1. 01 Generating images via models like FLUX 1.1 Pro through an API
  2. 02 Text-to-video and image animation generation, such as Wan 2.1
  3. 03 Running hosted LLMs without managing GPU infrastructure
  4. 04 Fine-tuning and deploying a custom production model with autoscaling
  5. 05 Text-to-speech, music generation, and image restoration tasks

Pros and cons

Pros and cons of Replicate
Pros Cons
A single line of code runs a vast catalog of open-source models — no infra to manage Private models and deployments bill for all online time, including setup and idle, not just active inference
Granular per-second billing means you pay only for compute actually used on public models Cold starts on scaled-to-zero deployments add latency and, for private models, billable setup time
Wide hardware range from cheap CPU to multi-GPU H100/A100 clusters Cost is hard to predict: per-second compute means runtime variance directly changes the bill
Cog packaging, fine-tuning, and deployment cover the full custom-model lifecycle No permanent free tier — only limited free runs on a curated set, then billing is required
Transparent published per-second and per-run rates on the pricing page

What are the best Replicate alternatives?

How people make money with Replicate

  • Sell a generative-media API product — image, video, or audio — built on Replicate models, pricing each request above its per-second compute cost to capture margin
  • Package a fine-tuned, niche model (brand-style images, voice clone, domain LLM) as a paid endpoint, deploying it with Cog and charging per generation

Frequently asked questions

How does Replicate pricing work?

It is pay-as-you-go. Public models bill by active processing time (or per output or token), while hardware is billed per second by tier — from CPU at $0.000025/sec up to multi-GPU H100 at $0.003050/sec.

How much does an Nvidia A100 cost on Replicate?

An A100 with 80GB is $0.001400 per second, equivalent to about $5.04 per hour. A 2x A100 configuration is $0.002800 per second.

What GPUs does Replicate offer?

Tiers include Nvidia T4 at $0.000225/sec, L40S at $0.000975/sec, A100 80GB at $0.001400/sec, and H100 at $0.001525/sec, plus 2x multi-GPU options for larger workloads.

Is there a free tier?

There is no permanent free tier. New accounts get limited free runs on a curated model set, and after that you must add billing to keep running models.

Does Replicate have an API and CLI?

Yes. It provides an HTTP API with official Python and Node.js client libraries, plus a CLI, and Cog for packaging and deploying custom models.

Do I pay when my model is idle?

For public models you only pay for active processing time. But private models and deployments bill for all online time, including setup and idle periods.