OpenRouter
Unified API to 400+ LLMs from 70+ providers through one OpenAI-compatible endpoint, with automatic failover and pass-through token pricing
Free, open-source tool to run open-weight LLMs locally via CLI, desktop, or API — with an optional paid cloud for larger models
Ollama is a free, open-source runtime for running open-weight LLMs locally through one CLI, desktop app, or REST API — no GPU rental required, and unlimited on your own hardware. An optional Ollama Cloud adds remote inference for larger models: Free, Pro at $20/month (or $200/year), and Max at $100/month. Best for developers who want private, offline model access.
Ollama is an open-source runtime for running open-weight large language models on your own machine. A single-line install sets up a CLI, a desktop app, and a local REST API, so you can pull a model and start chatting or building against it in minutes — without renting a GPU or sending data to a hosted provider. Because the model runs on your hardware, it works fully offline and keeps prompts and responses local.
Beyond the free local runtime, Ollama offers a paid Ollama Cloud subscription for running larger models remotely when your own hardware is not enough. Cloud is billed by GPU time rather than per token, with infrastructure in the US, Europe, and Singapore. The Free tier covers light cloud usage with one concurrent model; Pro at $20/month raises that to three concurrent models and 50x the usage; and Max at $100/month allows ten concurrent models. A Team plan with SSO and centralized billing is listed as coming soon.
Ollama suits developers and technical users who want direct, private control over which models they run and where. The free local runtime is enough for most experimentation and personal use; the cloud tier matters only when you need models larger than your hardware can load.
Starting price: $0 · Free tier: yes · Model: freemium
Price history tracked from June 2026
| Plan | Price | Includes |
|---|---|---|
| Free | Free | Unlimited local model runs on your own hardware · CLI, desktop apps, and REST API access · 1 concurrent cloud model with light usage · Fully offline and private operation |
| Pro | $20/mo | $20/month or $200/year · 3 concurrent cloud models · 50x more cloud usage than Free · Access to larger cloud models; upload private models |
| Max | $100/mo | 10 concurrent cloud models · 5x more usage than Pro · Everything in Pro |
| Team | Coming soon | Shared team usage and centralized billing · SSO and model access controls · Not yet available |
| Pros | Cons |
|---|---|
| Core local runtime is completely free, open-source, and unlimited on your own hardware | Cloud is billed by GPU time, not tokens, so heavier models burn quota faster and cost is harder to predict |
| Single-command install and a simple CLI/API lower the barrier to running open models | The free cloud tier is limited to 1 concurrent model with light usage, so it is unsuitable for sustained production work |
| Optional cloud scales to larger models without buying GPUs | The Team plan (SSO, centralized billing, access controls) is still 'coming soon', so enterprise governance is not yet available |
| Strong privacy stance: data is not logged or trained on, and a full offline mode is available | Local performance depends entirely on your own hardware; large models are impractical without a capable GPU |
Unified API to 400+ LLMs from 70+ providers through one OpenAI-compatible endpoint, with automatic failover and pass-through token pricing
Run and fine-tune thousands of open-source AI models with one line of code via a cloud API, billed per second of GPU or CPU compute
Mistral's multimodal AI assistant — chat, web search, code execution, and image generation from a European frontier lab
Open-weight AI chatbot and API with 1M-token context and usage-based pricing
Anthropic's AI assistant built for long-context analysis, coding, and safe reasoning
Yes. The core Ollama software for running open-weight models locally is free, open-source, and unlimited on your own hardware. A separate paid Ollama Cloud subscription is optional.
Ollama Cloud has a Free tier, Pro at $20/month (or $200/year), and Max at $100/month. A Team plan with SSO and centralized billing is listed as coming soon.
Free allows 1 concurrent cloud model with light usage. Pro at $20/month allows 3 concurrent models and 50x more cloud usage than Free. Max at $100/month allows 10 concurrent models and 5x more usage than Pro.
By GPU time, which depends on model size and request duration, rather than by tokens. Models are graded across usage levels from light to heavy, so heavier models consume your quota faster.
Yes. Ollama exposes a REST API for both local and cloud inference, alongside a CLI and desktop apps for macOS, Windows, and Linux.
Yes. Local models run fully offline on your own hardware, and Ollama states that prompt and response data is never logged or used for training.