Choosing a Provider

Not sure which LLM provider to use? This page compares the providers supported by Moltis so you can pick the best fit for your use case.

Quick Recommendations

Goal	Provider	Why
Best overall quality	Anthropic	Claude Sonnet 4 and Opus 4 excel at tool use, long context, and instruction following
Widest model range	OpenAI	GPT-4.1, o3/o4-mini reasoning models, image generation
Largest context window	Google Gemini	Up to 1M tokens with Gemini 2.5 Pro
Best value	DeepSeek	DeepSeek V3 and R1 offer strong performance at low cost
Fast inference	Groq	Hardware-accelerated inference, very low latency
Free / offline	Ollama	Run open models locally, no API key needed
Rising stars	MiniMax, Z.AI	MiniMax and GLM-4 models are gaining traction for quality and price

Provider Comparison

Provider	Top Models	Tool Use	Streaming	Context	Price Tier	Speed	Notes
Anthropic	Claude Sonnet 4, Opus 4	Full	Yes	200K	$$	Fast	Best tool-use reliability
OpenAI	GPT-4.1, o3, o4-mini	Full	Yes	128K-1M	$$	Fast	Widest ecosystem, reasoning models
Google Gemini	Gemini 2.5 Pro, 2.5 Flash	Full	Yes	1M	$	Fast	Largest context, competitive pricing
DeepSeek	V3, R1	Full	Yes	128K	$	Medium	Excellent quality-to-price ratio
Groq	Llama 3, Mixtral, Gemma	Partial	Yes	128K	$	Very fast	Speed-optimized hardware inference
xAI	Grok 3, Grok 3 Mini	Yes	Yes	128K	$$	Fast	Strong reasoning capabilities
Mistral	Mistral Large, Medium	Full	Yes	128K	$$	Fast	European provider, multilingual
OpenRouter	Any (aggregator)	Varies	Yes	Varies	Varies	Varies	Access 100+ models with one key
Cerebras	Llama 3	Partial	Yes	128K	$	Very fast	Wafer-scale inference hardware
MiniMax	MiniMax-Text-01, abab7	Full	Yes	1M	$	Fast	Strong multilingual, long context
Z.AI (Zhipu)	GLM-4, GLM-4 Air	Full	Yes	128K	$	Fast	GLM-4 series, competitive quality
Z.AI Coding	CodeGeeX, GLM-4 Code	Full	Yes	128K	$	Fast	Optimized for code tasks
Moonshot	Kimi	Full	Yes	200K	$	Medium	Long context, Chinese/English
Venice	Various	Varies	Yes	Varies	$	Medium	Privacy-focused, uncensored models
Ollama	Any GGUF model	Varies	Yes	Varies	Free	Varies	Local inference, no API key
Local LLM	Any GGUF model	Varies	Yes	Varies	Free	Varies	Built-in GGUF runner, no server needed
GitHub Copilot	GPT-4o, Claude (via Copilot)	Full	Yes	Varies	Subscription	Fast	Uses existing Copilot subscription
OpenAI Codex	Codex models	Full	Yes	Varies	$$	Fast	OAuth-based, code-focused

Price Tier Legend

Symbol	Meaning
Free	No cost (local inference)
$	Budget-friendly (< $1/M input tokens)
$$	Standard pricing ($1-15/M input tokens)
$$$	Premium pricing (> $15/M input tokens)
Subscription	Flat monthly fee

How to Choose

For personal projects or experimentation

Start with Google Gemini (generous free tier, large context) or Ollama (completely free, runs locally). Both are easy to set up and let you explore without cost pressure.

For production agent workflows

Anthropic and OpenAI are the most battle-tested for tool use and complex multi-step tasks. Anthropic’s Claude models tend to follow instructions more precisely; OpenAI offers a broader model range including reasoning models (o3, o4-mini).

For cost-sensitive workloads

DeepSeek offers the best quality-to-price ratio for most tasks. Groq and Cerebras provide extremely fast inference at low cost, though model selection is more limited.

For local / offline use

Ollama is the easiest path — install it, pull a model, and Moltis auto-detects it. Local LLM runs GGUF models directly without a separate server. Both require sufficient RAM (8GB+ for small models, 16GB+ recommended).

For access to many models

OpenRouter aggregates 100+ models behind a single API key. Useful if you want to experiment across providers without managing multiple accounts.

Setting Up a Provider

See the LLM Providers page for step-by-step setup instructions for each provider, including configuration file options and environment variables.

Keyboard shortcuts

Moltis Documentation