Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Choosing a Provider

Not sure which LLM provider to use? This page compares the providers supported by Moltis so you can pick the best fit for your use case.

Quick Recommendations

GoalProviderWhy
Best overall qualityAnthropicClaude Sonnet 4 and Opus 4 excel at tool use, long context, and instruction following
Widest model rangeOpenAIGPT-4.1, o3/o4-mini reasoning models, image generation
Largest context windowGoogle GeminiUp to 1M tokens with Gemini 2.5 Pro
Best valueDeepSeekDeepSeek V3 and R1 offer strong performance at low cost
Fast inferenceGroqHardware-accelerated inference, very low latency
Free / offlineOllamaRun open models locally, no API key needed
Rising starsMiniMax, Z.AIMiniMax and GLM-4 models are gaining traction for quality and price

Provider Comparison

ProviderTop ModelsTool UseStreamingContextPrice TierSpeedNotes
AnthropicClaude Sonnet 4, Opus 4FullYes200K$$FastBest tool-use reliability
OpenAIGPT-4.1, o3, o4-miniFullYes128K-1M$$FastWidest ecosystem, reasoning models
Google GeminiGemini 2.5 Pro, 2.5 FlashFullYes1M$FastLargest context, competitive pricing
DeepSeekV3, R1FullYes128K$MediumExcellent quality-to-price ratio
GroqLlama 3, Mixtral, GemmaPartialYes128K$Very fastSpeed-optimized hardware inference
xAIGrok 3, Grok 3 MiniYesYes128K$$FastStrong reasoning capabilities
MistralMistral Large, MediumFullYes128K$$FastEuropean provider, multilingual
OpenRouterAny (aggregator)VariesYesVariesVariesVariesAccess 100+ models with one key
CerebrasLlama 3PartialYes128K$Very fastWafer-scale inference hardware
MiniMaxMiniMax-Text-01, abab7FullYes1M$FastStrong multilingual, long context
Z.AI (Zhipu)GLM-4, GLM-4 AirFullYes128K$FastGLM-4 series, competitive quality
Z.AI CodingCodeGeeX, GLM-4 CodeFullYes128K$FastOptimized for code tasks
MoonshotKimiFullYes200K$MediumLong context, Chinese/English
VeniceVariousVariesYesVaries$MediumPrivacy-focused, uncensored models
OllamaAny GGUF modelVariesYesVariesFreeVariesLocal inference, no API key
Local LLMAny GGUF modelVariesYesVariesFreeVariesBuilt-in GGUF runner, no server needed
GitHub CopilotGPT-4o, Claude (via Copilot)FullYesVariesSubscriptionFastUses existing Copilot subscription
OpenAI CodexCodex modelsFullYesVaries$$FastOAuth-based, code-focused

Price Tier Legend

SymbolMeaning
FreeNo cost (local inference)
$Budget-friendly (< $1/M input tokens)
$$Standard pricing ($1-15/M input tokens)
$$$Premium pricing (> $15/M input tokens)
SubscriptionFlat monthly fee

How to Choose

For personal projects or experimentation

Start with Google Gemini (generous free tier, large context) or Ollama (completely free, runs locally). Both are easy to set up and let you explore without cost pressure.

For production agent workflows

Anthropic and OpenAI are the most battle-tested for tool use and complex multi-step tasks. Anthropic’s Claude models tend to follow instructions more precisely; OpenAI offers a broader model range including reasoning models (o3, o4-mini).

For cost-sensitive workloads

DeepSeek offers the best quality-to-price ratio for most tasks. Groq and Cerebras provide extremely fast inference at low cost, though model selection is more limited.

For local / offline use

Ollama is the easiest path — install it, pull a model, and Moltis auto-detects it. Local LLM runs GGUF models directly without a separate server. Both require sufficient RAM (8GB+ for small models, 16GB+ recommended).

For access to many models

OpenRouter aggregates 100+ models behind a single API key. Useful if you want to experiment across providers without managing multiple accounts.

Setting Up a Provider

See the LLM Providers page for step-by-step setup instructions for each provider, including configuration file options and environment variables.