Groq
Fast inference on LPU hardware. Best-in-class throughput for open-weight models including Llama and Mistral.
7 models
·
30 RPM, 14.4K req/day
DeepInfra
Affordable GPU cloud for open AI models. Serverless and dedicated instances.
5 models
·
200 concurrent requests
Hyperbolic
GPU cloud with extremely low pricing for open AI models.
5 models
·
$1 free credits
NVIDIA NIM
Optimized inference containers for NVIDIA GPUs. Enterprise-grade performance.
5 models
·
~40 RPM
Cloudflare Workers AI
Edge AI inference via Cloudflare's global network. Low latency worldwide.
5 models
·
10K neurons/day, 300 RPM
Google AI Studio
Gemini models via Google's AI Studio. High context windows, multimodal.
5 models
·
15–60 RPM, 250–1.5K req/day
SambaNova
Dataflow architecture optimized for AI inference. High throughput for enterprise.
5 models
·
Generous dev tier
SiliconFlow
Chinese AI API hub with competitive pricing. Good for Qwen and DeepSeek models.
5 models
·
100 req/day + $1 free credits
Scaleway
European cloud provider with generative AI endpoints. 1M free tokens.
4 models
·
1M free tokens (permanent)
Alibaba DashScope
Alibaba's AI platform. Extensive model library including Qwen and open-source models.
5 models
·
1M free tokens/model (90 days)
OVHcloud AI Endpoints
European cloud AI endpoints. Competitive pricing on open models.
4 models
·
2 req/min/IP free, 400 RPM with key
Replicate
Cloud for running open-source models. Easy API for Llama, Mistral, FLUX.
5 models
·
6 req/min free, 3K RPM with payment
Hugging Face
Inference Endpoints from Hugging Face. Managed deployment for open models.
5 models
·
~$0.10/month free credits
Perplexity
API access to Perplexity's Sonar models. Excellent for research and factuality.
4 models
·
~50 RPM (tiered by spend)
Mistral La Plateforme
Official Mistral API. Home of Codestral, Mistral Large, and open models.
5 models
·
1 req/s, 1B tokens/month
Codestral
Mistral's code-specialized model. Excellent for completion and refactor tasks.
2 models
·
30 RPM, 2K req/day
Cerebras
Fastest inference on custom Wafer Scale Engine hardware. Up to 1M tokens/day free.
4 models
·
1M tokens/day
Kilo Code
Open source AI coding agent for VS Code, JetBrains, CLI. 500+ models.
4 models
·
200 req/hr for anonymous users
RunPod
Cloud platform for deploying full-stack AI apps. 750K+ developers.
3 models
·
Varies by plan
Nous Portal
Nous Research API portal. Home of Hermes and Dischat models.
3 models
·
Ultra: 1,600 RPM
Z.AI GLM
GLM models via Z.ai. Fast coding plans from $10/month.
3 models
·
Varies by plan
BytePlus ModelArk
ByteDance's AI platform. 500K free tokens per LLM, 2M vision tokens.
4 models
·
500K free tokens per model
MiniMax
Chinese AI leader with abab, M1, M2 models. Coding plans from $10/month.
3 models
·
Varies by plan
Venice
Private, uncensored AI for text and image generation. No account required to try. Decentralized inference with focus on privacy and free speech.
4 models
·
Varies by plan
LLM Gateway
Open-source API gateway for LLMs. Route requests to 180+ models from 60+ providers with one integration. Self-hostable with usage tracking and cost optimization.
6 models
·
Depends on upstream provider
Synthetic
Privacy-focused AI platform offering private access to multiple open-source models. Founded by ex-Instagram/Meta engineers. Preferred by OpenClaw users.
4 models
·
Varies by plan
Canopy Wave
High-performance AI inference platform for open-source models. Founded in 2024 in Santa Clara. Optimized for cost, speed, and quality.
4 models
·
Varies by plan
Mimo
AI coding assistant and learning platform. Learn to code with AI-powered interactive lessons and projects.
0 models
·
N/A
Anthropic
Claude API by Anthropic. State-of-the-art reasoning models including Claude 3.5 Sonnet, Claude 4 Opus, and Claude 4.5 Haiku. Usage-based pricing with no monthly subscription required.
5 models
·
Varies by usage tier
Fireworks AI
Fastest inference for open-source LLMs and image models. Serverless, on-demand GPU deployments, and fine-tuning on one platform. $1 free starter credits.
8 models
·
High rate limits with postpaid billing
GitHub Models
Run AI models directly in GitHub. Access GPT-4o, Claude, Llama, DeepSeek and more via GitHub's unified API. Free tier with rate limits, paid usage beyond.
5 models
·
Free tier limits, pay per token unit beyond
Ollama
Run open-source LLMs locally or in the cloud. Free local inference on your own hardware. Optional cloud plans for larger models with datacenter-grade GPUs.
6 models
·
Local: unlimited. Cloud: 1 concurrent model (Free)
Baidu Qianfan
Baidu's enterprise AI platform providing access to the ERNIE series models and other LLM capabilities through Baidu Intelligent Cloud.
5 models
·
Varies by tier
Kimi
Moonshot AI's Kimi platform offering trillion-parameter K2.5 and K2.6 large language models with up to 256K context window and tool calling support.
4 models
·
Varies by usage tier
StepFun
StepFun's open platform providing access to the Step family of large language models with strong reasoning and agentic capabilities.
4 models
·
Varies by plan
Chutes
Decentralized serverless AI compute platform on the Bittensor network offering LLM inference at highly competitive rates.
5 models
·
Varies by model
CrofAI
Affordable multi-model AI inference hosting service providing extremely low-cost access to large language models.
5 models
·
500 requests/day (Hobby)
CanopyWave
Serverless LLM inference and GPU cloud platform optimized for cost, speed, and quality with enterprise-grade infrastructure.
5 models
·
Varies by model
OpenCode
Low-cost subscription for open coding models. Curated and benchmarked specifically for coding agents with reliable global access.
12 models
·
$12/5hr, $30/week, $60/month
AI Router
Smart LLM routing platform that automatically selects the optimal model based on quality, cost, and speed requirements.
5 models
·
Free: 1K req/mo, Starter: 20K req/mo, Pro: 100K req/mo