Moltis
One binary, no runtime, no npm.
Moltis compiles your entire AI gateway — web UI, LLM providers, tools, and all assets — into a single self-contained executable. There’s no Node.js to babysit, no node_modules to sync, no V8 garbage collector introducing latency spikes.
# Quick install (macOS / Linux)
curl -fsSL https://www.moltis.org/install.sh | sh
Why Moltis?
| Feature | Moltis | Other Solutions |
|---|---|---|
| Deployment | Single binary | Node.js + dependencies |
| Memory Safety | Rust ownership | Garbage collection |
| Secret Handling | Zeroed on drop | “Eventually collected” |
| Sandbox | Docker + Apple Container | Docker only |
| Startup | Milliseconds | Seconds |
Key Features
- 30+ LLM Providers — Anthropic, OpenAI, Google, Mistral, local models, and more
- Streaming-First — Responses appear as tokens arrive, not after completion
- Sandboxed Execution — Commands run in isolated containers (Docker or Apple Container)
- MCP Support — Connect to Model Context Protocol servers for extended capabilities
- Multi-Channel — Web UI, Telegram, API access with synchronized responses
- Long-Term Memory — Embeddings-powered knowledge base with hybrid search
- Hook System — Observe, modify, or block actions at any lifecycle point
- Compile-Time Safety — Misconfigurations caught by
cargo check, not runtime crashes
Quick Start
# Install
curl -fsSL https://www.moltis.org/install.sh | sh
# Run
moltis
On first launch:
- Open the URL shown in your browser (e.g.,
http://localhost:13131) - Add your LLM API key
- Start chatting!
Authentication is only required when accessing Moltis from a non-localhost address. On localhost, you can start using it immediately.
How It Works
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Web UI │ │ Telegram │ │ API │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┴────────────────┘
│
▼
┌───────────────────────────────┐
│ Moltis Gateway │
│ ┌─────────┐ ┌───────────┐ │
│ │ Agent │ │ Tools │ │
│ │ Loop │◄┤ Registry │ │
│ └────┬────┘ └───────────┘ │
│ │ │
│ ┌────▼────────────────┐ │
│ │ Provider Registry │ │
│ │ Claude · GPT · Gemini │ │
│ └─────────────────────┘ │
└───────────────────────────────┘
│
┌───────▼───────┐
│ Sandbox │
│ Docker/Apple │
└───────────────┘
Documentation
Getting Started
- Quickstart — Up and running in 5 minutes
- Installation — All installation methods
- Configuration —
moltis.tomlreference
Features
- Providers — Configure LLM providers
- MCP Servers — Extend with Model Context Protocol
- Hooks — Lifecycle hooks for customization
- Local LLMs — Run models on your machine
Deployment
- Docker — Container deployment
Architecture
- Streaming — How real-time streaming works
- Metrics & Tracing — Observability
Security
Moltis applies defense in depth:
- Authentication — Password or passkey (WebAuthn) required for non-localhost access
- SSRF Protection — Blocks requests to internal networks
- Secret Handling —
secrecy::Secretzeroes memory on drop - Sandboxed Execution — Commands never run on the host
- Origin Validation — Prevents Cross-Site WebSocket Hijacking
- No Unsafe Code —
unsafeis denied workspace-wide
Community
- GitHub: github.com/moltis-org/moltis
- Issues: Report bugs
- Discussions: Ask questions
License
MIT — Free for personal and commercial use.
Quickstart
Get Moltis running in under 5 minutes.
1. Install
curl -fsSL https://www.moltis.org/install.sh | sh
Or via Homebrew:
brew install moltis-org/tap/moltis
2. Start
moltis
You’ll see output like:
🚀 Moltis gateway starting...
🌐 Open http://localhost:13131 in your browser
3. Configure a Provider
You need an LLM API key to chat. The easiest options:
Option A: Anthropic (Recommended)
- Get an API key from console.anthropic.com
- In Moltis, go to Settings → Providers
- Click Anthropic → Enter your API key → Save
Option B: OpenAI
- Get an API key from platform.openai.com
- In Moltis, go to Settings → Providers
- Click OpenAI → Enter your API key → Save
Option C: Local Model (Free)
- Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh - Pull a model:
ollama pull llama3.2 - In Moltis, configure Ollama in Settings → Providers
4. Chat!
Go to the Chat tab and start a conversation:
You: Write a Python function to check if a number is prime
Agent: Here's a Python function to check if a number is prime:
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
What’s Next?
Enable Tool Use
Moltis can execute code, browse the web, and more. Tools are enabled by default with sandbox protection.
Try:
You: Create a hello.py file that prints "Hello, World!" and run it
Connect Telegram
Chat with your agent from anywhere:
- Create a bot via @BotFather
- Copy the bot token
- In Moltis: Settings → Telegram → Enter token → Save
- Message your bot!
Add MCP Servers
Extend capabilities with MCP servers:
# In moltis.toml
[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }
Set Up Memory
Enable long-term memory for context across sessions:
# In moltis.toml
[memory]
enabled = true
Add knowledge by placing Markdown files in ~/.moltis/memory/.
Useful Commands
| Command | Description |
|---|---|
/new | Start a new session |
/model <name> | Switch models |
/clear | Clear chat history |
/help | Show available commands |
File Locations
| Path | Contents |
|---|---|
~/.config/moltis/moltis.toml | Configuration |
~/.config/moltis/provider_keys.json | API keys |
~/.moltis/ | Data (sessions, memory, logs) |
Getting Help
- Documentation: docs.moltis.org
- GitHub Issues: github.com/moltis-org/moltis/issues
- Discussions: github.com/moltis-org/moltis/discussions
Installation
Moltis is distributed as a single self-contained binary. Choose the installation method that works best for your setup.
Quick Install (Recommended)
The fastest way to get started on macOS or Linux:
curl -fsSL https://www.moltis.org/install.sh | sh
This downloads the latest release for your platform and installs it to ~/.local/bin.
Package Managers
Homebrew (macOS / Linux)
brew install moltis-org/tap/moltis
Cargo Binstall (Pre-built Binary)
If you have cargo-binstall installed:
cargo binstall moltis
This downloads a pre-built binary without compiling from source.
Linux Packages
Debian / Ubuntu (.deb)
# Download the latest .deb package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis_amd64.deb
# Install
sudo dpkg -i moltis_amd64.deb
Fedora / RHEL (.rpm)
# Download the latest .rpm package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.x86_64.rpm
# Install
sudo rpm -i moltis.x86_64.rpm
Arch Linux (.pkg.tar.zst)
# Download the latest package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.pkg.tar.zst
# Install
sudo pacman -U moltis.pkg.tar.zst
Snap
sudo snap install moltis
AppImage
# Download
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.AppImage
chmod +x moltis.AppImage
# Run
./moltis.AppImage
Docker
Multi-architecture images (amd64/arm64) are published to GitHub Container Registry:
docker pull ghcr.io/moltis-org/moltis:latest
See Docker Deployment for full instructions on running Moltis in a container.
Build from Source
Prerequisites
- Rust 1.75 or later
- A C compiler (for some dependencies)
Clone and Build
git clone https://github.com/moltis-org/moltis.git
cd moltis
cargo build --release
The binary will be at target/release/moltis.
Install via Cargo
cargo install moltis --git https://github.com/moltis-org/moltis
First Run
After installation, start Moltis:
moltis
On first launch:
- Open
http://localhost:<port>in your browser (the port is shown in the terminal output) - Configure your LLM provider (API key)
- Start chatting!
Moltis picks a random available port on first install to avoid conflicts. The port is saved in your config and reused on subsequent runs.
Authentication is only required when accessing Moltis from a non-localhost address (e.g., over the network). When this happens, a one-time setup code is printed to the terminal for initial authentication setup.
Verify Installation
moltis --version
Updating
Homebrew
brew upgrade moltis
Cargo Binstall
cargo binstall moltis
From Source
cd moltis
git pull
cargo build --release
Uninstalling
Homebrew
brew uninstall moltis
Remove Data
Moltis stores data in two directories:
# Configuration
rm -rf ~/.config/moltis
# Data (sessions, databases, memory)
rm -rf ~/.moltis
Removing these directories deletes all your conversations, memory, and settings permanently.
Configuration
Moltis is configured through moltis.toml, located in ~/.config/moltis/ by default.
On first run, a complete configuration file is generated with sensible defaults. You can edit it to customize behavior.
Configuration File Location
| Platform | Default Path |
|---|---|
| macOS/Linux | ~/.config/moltis/moltis.toml |
| Custom | Set via --config-dir or MOLTIS_CONFIG_DIR |
Basic Settings
[gateway]
port = 13131 # HTTP/WebSocket port
host = "0.0.0.0" # Listen address
[agent]
name = "Moltis" # Agent display name
model = "claude-sonnet-4-20250514" # Default model
timeout = 600 # Agent run timeout (seconds)
max_iterations = 25 # Max tool call iterations per run
LLM Providers
Provider API keys are stored separately in ~/.config/moltis/provider_keys.json for security. Configure them through the web UI or directly in the JSON file.
[providers]
default = "anthropic" # Default provider
[providers.anthropic]
enabled = true
models = [
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
"claude-3-5-haiku-20241022",
]
[providers.openai]
enabled = true
models = [
"gpt-4o",
"gpt-4o-mini",
"o1-preview",
]
See Providers for detailed provider configuration.
Sandbox Configuration
Commands run inside isolated containers for security:
[tools.exec.sandbox]
enabled = true
backend = "docker" # "docker" or "apple" (macOS 15+)
base_image = "ubuntu:25.10"
# Packages installed in the sandbox image
packages = [
"curl",
"git",
"jq",
"python3",
"python3-pip",
"nodejs",
"npm",
]
When you modify the packages list and restart, Moltis automatically rebuilds the sandbox image with a new tag.
Memory System
Long-term memory uses embeddings for semantic search:
[memory]
enabled = true
embedding_model = "text-embedding-3-small" # OpenAI embedding model
chunk_size = 512 # Characters per chunk
chunk_overlap = 50 # Overlap between chunks
# Directories to watch for memory files
watch_dirs = [
"~/.moltis/memory",
]
Authentication
Authentication is only required when accessing Moltis from a non-localhost address. When running on localhost or 127.0.0.1, no authentication is needed by default.
When you access Moltis from a network address (e.g., http://192.168.1.100:13131), a one-time setup code is printed to the terminal. Use it to set up a password or passkey.
[auth]
disabled = false # Set true to disable auth entirely
# Session settings
session_expiry = 604800 # Session lifetime in seconds (7 days)
Only set disabled = true if Moltis is running on a trusted private network. Never expose an unauthenticated instance to the internet.
Hooks
Configure lifecycle hooks:
[[hooks]]
name = "my-hook"
command = "./hooks/my-hook.sh"
events = ["BeforeToolCall", "AfterToolCall"]
timeout = 5 # Timeout in seconds
[hooks.env]
MY_VAR = "value" # Environment variables for the hook
See Hooks for the full hook system documentation.
MCP Servers
Connect to Model Context Protocol servers:
[[mcp.servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed"]
[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }
Telegram Integration
[telegram]
enabled = true
# Token is stored in provider_keys.json, not here
allowed_users = [123456789] # Telegram user IDs allowed to chat
TLS / HTTPS
[tls]
enabled = true
cert_path = "~/.config/moltis/cert.pem"
key_path = "~/.config/moltis/key.pem"
# If paths don't exist, a self-signed certificate is generated
# Port for the plain-HTTP redirect / CA-download server.
# Defaults to the gateway port + 1 when not set.
# http_redirect_port = 13132
Override via environment variable: MOLTIS_TLS__HTTP_REDIRECT_PORT=8080.
Tailscale Integration
Expose Moltis over your Tailscale network:
[tailscale]
enabled = true
mode = "serve" # "serve" (private) or "funnel" (public)
Observability
[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317" # OpenTelemetry collector
Environment Variables
All settings can be overridden via environment variables:
| Variable | Description |
|---|---|
MOLTIS_CONFIG_DIR | Configuration directory |
MOLTIS_DATA_DIR | Data directory |
MOLTIS_PORT | Gateway port |
MOLTIS_HOST | Listen address |
CLI Flags
moltis --config-dir /path/to/config --data-dir /path/to/data
Complete Example
[gateway]
port = 13131
host = "0.0.0.0"
[agent]
name = "Atlas"
model = "claude-sonnet-4-20250514"
timeout = 600
max_iterations = 25
[providers]
default = "anthropic"
[tools.exec.sandbox]
enabled = true
backend = "docker"
base_image = "ubuntu:25.10"
packages = ["curl", "git", "jq", "python3", "nodejs"]
[memory]
enabled = true
[auth]
disabled = false
[[hooks]]
name = "audit-log"
command = "./hooks/audit.sh"
events = ["BeforeToolCall"]
timeout = 5
LLM Providers
Moltis supports 30+ LLM providers through a trait-based architecture. Configure providers through the web UI or directly in configuration files.
Supported Providers
Tier 1 (Full Support)
| Provider | Models | Tool Calling | Streaming |
|---|---|---|---|
| Anthropic | Claude 4, Claude 3.5, Claude 3 | ✅ | ✅ |
| OpenAI | GPT-4o, GPT-4, o1, o3 | ✅ | ✅ |
| Gemini 2.0, Gemini 1.5 | ✅ | ✅ | |
| GitHub Copilot | GPT-4o, Claude | ✅ | ✅ |
Tier 2 (Good Support)
| Provider | Models | Tool Calling | Streaming |
|---|---|---|---|
| Mistral | Mistral Large, Codestral | ✅ | ✅ |
| Groq | Llama 3, Mixtral | ✅ | ✅ |
| Together | Various open models | ✅ | ✅ |
| Fireworks | Various open models | ✅ | ✅ |
| DeepSeek | DeepSeek V3, Coder | ✅ | ✅ |
Tier 3 (Basic Support)
| Provider | Notes |
|---|---|
| OpenRouter | Aggregator for 100+ models |
| Ollama | Local models |
| Venice | Privacy-focused |
| Cerebras | Fast inference |
| SambaNova | Enterprise |
| Cohere | Command models |
| AI21 | Jamba models |
Configuration
Via Web UI (Recommended)
- Open Moltis in your browser
- Go to Settings → Providers
- Click on a provider card
- Enter your API key
- Select your preferred model
Via Configuration Files
Provider credentials are stored in ~/.config/moltis/provider_keys.json:
{
"anthropic": {
"apiKey": "sk-ant-...",
"model": "claude-sonnet-4-20250514"
},
"openai": {
"apiKey": "sk-...",
"model": "gpt-4o"
}
}
Enable providers in moltis.toml:
[providers]
default = "anthropic"
[providers.anthropic]
enabled = true
models = [
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
]
[providers.openai]
enabled = true
Provider-Specific Setup
Anthropic
- Get an API key from console.anthropic.com
- Enter it in Settings → Providers → Anthropic
OpenAI
- Get an API key from platform.openai.com
- Enter it in Settings → Providers → OpenAI
GitHub Copilot
GitHub Copilot uses OAuth authentication:
- Click Connect in Settings → Providers → GitHub Copilot
- Complete the GitHub OAuth flow
- Authorize Moltis to access Copilot
Google (Gemini)
- Get an API key from aistudio.google.com
- Enter it in Settings → Providers → Google
Ollama (Local Models)
Run models locally with Ollama:
- Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh - Pull a model:
ollama pull llama3.2 - Configure in Moltis:
{
"ollama": {
"baseUrl": "http://localhost:11434",
"model": "llama3.2"
}
}
OpenRouter
Access 100+ models through one API:
- Get an API key from openrouter.ai
- Enter it in Settings → Providers → OpenRouter
- Specify the model ID you want to use
{
"openrouter": {
"apiKey": "sk-or-...",
"model": "anthropic/claude-3.5-sonnet"
}
}
Custom Base URLs
For providers with custom endpoints (enterprise, proxies):
{
"openai": {
"apiKey": "sk-...",
"baseUrl": "https://your-proxy.example.com/v1",
"model": "gpt-4o"
}
}
Switching Providers
Per-Session
In the chat interface, use the model selector dropdown to switch providers/models for the current session.
Per-Message
Use the /model command to switch models mid-conversation:
/model claude-opus-4-20250514
Default Provider
Set the default in moltis.toml:
[providers]
default = "anthropic"
[agent]
model = "claude-sonnet-4-20250514"
Model Capabilities
Different models have different strengths:
| Use Case | Recommended Model |
|---|---|
| General coding | Claude Sonnet 4, GPT-4o |
| Complex reasoning | Claude Opus 4, o1 |
| Fast responses | Claude Haiku, GPT-4o-mini |
| Long context | Claude (200k), Gemini (1M+) |
| Local/private | Llama 3 via Ollama |
Troubleshooting
“Model not available”
The model may not be enabled for your account or region. Check:
- Your API key has access to the model
- The model ID is spelled correctly
- Your account has sufficient credits
“Rate limited”
You’ve exceeded the provider’s rate limits. Solutions:
- Wait and retry
- Use a different provider
- Upgrade your API plan
“Invalid API key”
- Verify the key is correct (no extra spaces)
- Check the key hasn’t expired
- Ensure the key has the required permissions
MCP Servers
Moltis supports the Model Context Protocol (MCP) for connecting to external tool servers. MCP servers extend your agent’s capabilities without modifying Moltis itself.
What is MCP?
MCP is an open protocol that lets AI assistants connect to external tools and data sources. Think of MCP servers as plugins that provide:
- Tools — Functions the agent can call (e.g., search, file operations, API calls)
- Resources — Data the agent can read (e.g., files, database records)
- Prompts — Pre-defined prompt templates
Supported Transports
| Transport | Description | Use Case |
|---|---|---|
| stdio | Local process via stdin/stdout | npm packages, local scripts |
| HTTP/SSE | Remote server via HTTP | Cloud services, shared servers |
Adding an MCP Server
Via Web UI
- Go to Settings → MCP Servers
- Click Add Server
- Enter the server configuration
- Click Save
Via Configuration
Add servers to moltis.toml:
[[mcp.servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/projects"]
[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }
[[mcp.servers]]
name = "remote-api"
url = "https://mcp.example.com/sse"
transport = "sse"
Popular MCP Servers
Official Servers
| Server | Description | Install |
|---|---|---|
| filesystem | Read/write local files | npx @modelcontextprotocol/server-filesystem |
| github | GitHub API access | npx @modelcontextprotocol/server-github |
| postgres | PostgreSQL queries | npx @modelcontextprotocol/server-postgres |
| sqlite | SQLite database | npx @modelcontextprotocol/server-sqlite |
| puppeteer | Browser automation | npx @modelcontextprotocol/server-puppeteer |
| brave-search | Web search | npx @modelcontextprotocol/server-brave-search |
Community Servers
Explore more at mcp.so and GitHub MCP Servers.
Configuration Options
[[mcp.servers]]
name = "my-server" # Display name
command = "node" # Command to run
args = ["server.js"] # Command arguments
cwd = "/path/to/server" # Working directory
# Environment variables
env = { API_KEY = "secret", DEBUG = "true" }
# Health check settings
health_check_interval = 30 # Seconds between health checks
restart_on_failure = true # Auto-restart on crash
max_restart_attempts = 5 # Give up after N restarts
restart_backoff = "exponential" # "linear" or "exponential"
Server Lifecycle
┌─────────────────────────────────────────────────────┐
│ MCP Server │
│ │
│ Start → Initialize → Ready → [Tool Calls] → Stop │
│ │ │ │
│ ▼ ▼ │
│ Health Check ◄─────────── Heartbeat │
│ │ │ │
│ ▼ ▼ │
│ Crash Detected ───────────► Restart │
│ │ │
│ Backoff Wait │
└─────────────────────────────────────────────────────┘
Health Monitoring
Moltis monitors MCP servers and automatically:
- Detects crashes via process exit
- Restarts with exponential backoff
- Disables after max restart attempts
- Re-enables after cooldown period
Using MCP Tools
Once connected, MCP tools appear alongside built-in tools. The agent can use them naturally:
User: Search GitHub for Rust async runtime projects
Agent: I'll search GitHub for you.
[Calling github.search_repositories with query="rust async runtime"]
Found 15 repositories:
1. tokio-rs/tokio - A runtime for writing reliable async applications
2. async-std/async-std - Async version of the Rust standard library
...
Creating an MCP Server
Simple Node.js Server
// server.js
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new Server(
{ name: "my-server", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
server.setRequestHandler("tools/list", async () => ({
tools: [{
name: "hello",
description: "Says hello",
inputSchema: {
type: "object",
properties: {
name: { type: "string", description: "Name to greet" }
},
required: ["name"]
}
}]
}));
server.setRequestHandler("tools/call", async (request) => {
if (request.params.name === "hello") {
const name = request.params.arguments.name;
return { content: [{ type: "text", text: `Hello, ${name}!` }] };
}
});
const transport = new StdioServerTransport();
await server.connect(transport);
Configure in Moltis
[[mcp.servers]]
name = "my-server"
command = "node"
args = ["server.js"]
cwd = "/path/to/my-server"
Debugging
Check Server Status
In the web UI, go to Settings → MCP Servers to see:
- Connection status (connected/disconnected/error)
- Available tools
- Recent errors
View Logs
MCP server stderr is captured in Moltis logs:
# View gateway logs
tail -f ~/.moltis/logs/gateway.log | grep mcp
Test Locally
Run the server directly to debug:
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | node server.js
Security Considerations
- Review server code before running
- Limit file access — use specific paths, not
/ - Use environment variables for secrets
- Network isolation — run untrusted servers in containers
Troubleshooting
Server won’t start
- Check the command exists:
which npx - Verify the package:
npx @modelcontextprotocol/server-filesystem --help - Check for port conflicts
Tools not appearing
- Server may still be initializing (wait a few seconds)
- Check server logs for errors
- Verify the server implements
tools/list
Server keeps restarting
- Check stderr for crash messages
- Increase
max_restart_attemptsfor debugging - Verify environment variables are set correctly
Memory System
Moltis provides a powerful memory system that enables the agent to recall past conversations, notes, and context across sessions. This document explains the available backends, features, and configuration options.
Backends
Moltis supports two memory backends:
| Feature | Built-in | QMD |
|---|---|---|
| Search Type | Hybrid (vector + FTS5 keyword) | Hybrid (BM25 + vector + LLM reranking) |
| Local Embeddings | GGUF models via llama-cpp-2 | GGUF models |
| Remote Embeddings | OpenAI, Ollama, custom endpoints | Built-in |
| Embedding Cache | SQLite with LRU eviction | Built-in |
| Batch API | OpenAI batch (50% cost saving) | No |
| Circuit Breaker | Fallback chain with auto-recovery | No |
| LLM Reranking | Optional (configurable) | Built-in with query command |
| File Watching | Real-time sync via notify | Built-in |
| External Dependency | None (pure Rust) | Requires QMD binary (Node.js/Bun) |
| Offline Support | Yes (with local embeddings) | Yes |
Built-in Backend
The default backend uses SQLite for storage with FTS5 for keyword search and optional vector embeddings for semantic search. Key advantages:
- Zero external dependencies: Everything is embedded in the moltis binary
- Fallback chain: Automatically switches between embedding providers if one fails
- Batch embedding: Reduces OpenAI API costs by 50% for large sync operations
- Embedding cache: Avoids re-embedding unchanged content
QMD Backend
QMD is an optional external sidecar that provides enhanced search capabilities:
- BM25 keyword search: Fast, instant results (similar to Elasticsearch)
- Vector search: Semantic similarity using local GGUF models
- Hybrid search with LLM reranking: Combines both methods with an LLM pass for optimal relevance
To use QMD:
- Install QMD separately from github.com/qmd/qmd
- Enable it in Settings > Memory > Backend
Features
Citations
Citations append source file and line number information to search results:
Some important content from your notes.
Source: memory/notes.md#42
Configuration options:
auto(default): Include citations when results come from multiple fileson: Always include citationsoff: Never include citations
Session Export
When enabled, session transcripts are automatically exported to the memory system for cross-run recall. This allows the agent to remember past conversations even after restarts.
Exported sessions are:
- Stored in
memory/sessions/as markdown files - Sanitized to remove sensitive tool results and system messages
- Automatically cleaned up based on age/count limits
LLM Reranking
LLM reranking uses the configured language model to re-score and reorder search results based on semantic relevance to the query. This provides better results than keyword or vector matching alone, at the cost of additional latency.
How it works:
- Initial search returns candidate results
- LLM evaluates each result’s relevance (0.0-1.0 score)
- Results are reordered by combined score (70% LLM, 30% original)
Configuration
Memory settings can be configured in moltis.toml:
[memory]
# Backend: "builtin" (default) or "qmd"
backend = "builtin"
# Embedding provider: "local", "ollama", "openai", "custom", or auto-detect
provider = "local"
# Citation mode: "on", "off", or "auto"
citations = "auto"
# Enable LLM reranking for hybrid search
llm_reranking = false
# Export sessions to memory for cross-run recall
session_export = true
# QMD-specific settings (only used when backend = "qmd")
[memory.qmd]
command = "qmd"
max_results = 10
timeout_ms = 30000
Or via the web UI: Settings > Memory
Embedding Providers
The built-in backend supports multiple embedding providers:
| Provider | Model | Dimensions | Notes |
|---|---|---|---|
| Local (GGUF) | EmbeddingGemma-300M | 768 | Offline, ~300MB download |
| Ollama | nomic-embed-text | 768 | Requires Ollama running |
| OpenAI | text-embedding-3-small | 1536 | Requires API key |
| Custom | Configurable | Varies | OpenAI-compatible endpoint |
The system auto-detects available providers and creates a fallback chain:
- Try configured provider first
- Fall back to other available providers if it fails
- Use keyword-only search if no embedding provider is available
Memory Directories
By default, moltis indexes markdown files from:
~/.moltis/MEMORY.md- Main long-term memory file~/.moltis/memory/*.md- Additional memory files~/.moltis/memory/sessions/*.md- Exported session transcripts
Tools
The memory system exposes two agent tools:
memory_search
Search memory with a natural language query.
{
"query": "what did we discuss about the API design?",
"limit": 5
}
memory_get
Retrieve a specific chunk by ID.
{
"chunk_id": "memory/notes.md:42"
}
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Memory Manager │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Chunker │ │ Search │ │ Session Export │ │
│ │ (markdown) │ │ (hybrid) │ │ (transcripts) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Storage Backend │
│ ┌────────────────────────┐ ┌────────────────────────┐ │
│ │ Built-in (SQLite) │ │ QMD (sidecar) │ │
│ │ - FTS5 keyword │ │ - BM25 keyword │ │
│ │ - Vector similarity │ │ - Vector similarity │ │
│ │ - Embedding cache │ │ - LLM reranking │ │
│ └────────────────────────┘ └────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Embedding Providers │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────────┐ │
│ │ Local │ │ Ollama │ │ OpenAI │ │ Batch/Fallback│ │
│ │ (GGUF) │ │ │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────┘
Troubleshooting
Memory not working
- Check status in Settings > Memory
- Ensure at least one embedding provider is available:
- Local: Requires
local-embeddingsfeature enabled at build - Ollama: Must be running at
localhost:11434 - OpenAI: Requires
OPENAI_API_KEYenvironment variable
- Local: Requires
Search returns no results
- Check that memory files exist in the expected directories
- Trigger a manual sync by restarting moltis
- Check logs for sync errors
QMD not available
- Verify QMD is installed:
qmd --version - Check that the path is correct in settings
- Ensure QMD has indexed your collections:
qmd stats
Hooks
Hooks let you observe, modify, or block actions at key points in the agent lifecycle. Use them for auditing, policy enforcement, notifications, and custom integrations.
How Hooks Work
┌─────────────────────────────────────────────────────────┐
│ Agent Loop │
│ │
│ User Message → BeforeToolCall → Tool Execution │
│ │ │ │
│ ▼ ▼ │
│ [Your Hook] AfterToolCall │
│ │ │ │
│ modify/block [Your Hook] │
│ │ │ │
│ ▼ ▼ │
│ Continue → Response → MessageSent │
└─────────────────────────────────────────────────────────┘
Event Types
Modifying Events (Sequential)
These events run hooks sequentially. Hooks can modify the payload or block the action.
| Event | Description | Can Modify | Can Block |
|---|---|---|---|
BeforeToolCall | Before a tool executes | ✅ | ✅ |
BeforeCompaction | Before context compaction | ✅ | ✅ |
MessageSending | Before sending a response | ✅ | ✅ |
BeforeAgentStart | Before agent loop starts | ✅ | ✅ |
Read-Only Events (Parallel)
These events run hooks in parallel for performance. They cannot modify or block.
| Event | Description |
|---|---|
AfterToolCall | After a tool completes |
AfterCompaction | After context is compacted |
MessageReceived | When a user message arrives |
MessageSent | After response is delivered |
AgentEnd | When agent loop completes |
SessionStart | When a new session begins |
SessionEnd | When a session ends |
ToolResultPersist | When tool result is saved |
GatewayStart | When Moltis starts |
GatewayStop | When Moltis shuts down |
Command | When a slash command is used |
Creating a Hook
1. Create the Hook Directory
mkdir -p ~/.moltis/hooks/my-hook
2. Create HOOK.md
+++
name = "my-hook"
description = "Logs all tool calls to a file"
events = ["BeforeToolCall", "AfterToolCall"]
command = "./handler.sh"
timeout = 5
[requires]
os = ["darwin", "linux"]
bins = ["jq"]
env = ["LOG_FILE"]
+++
# My Hook
This hook logs all tool calls for auditing purposes.
3. Create the Handler Script
#!/bin/bash
# handler.sh
# Read event payload from stdin
payload=$(cat)
# Extract event type
event=$(echo "$payload" | jq -r '.event')
# Log to file
echo "$(date -Iseconds) $event: $payload" >> "$LOG_FILE"
# Exit 0 to continue (don't block)
exit 0
4. Make it Executable
chmod +x ~/.moltis/hooks/my-hook/handler.sh
Shell Hook Protocol
Hooks communicate via stdin/stdout and exit codes:
Input
The event payload is passed as JSON on stdin:
{
"event": "BeforeToolCall",
"data": {
"tool": "bash",
"arguments": {
"command": "ls -la"
}
},
"session_id": "abc123",
"timestamp": "2024-01-15T10:30:00Z"
}
Output
| Exit Code | Stdout | Result |
|---|---|---|
0 | (empty) | Continue normally |
0 | {"action":"modify","data":{...}} | Replace payload data |
1 | — | Block (stderr = reason) |
Example: Modify Tool Arguments
#!/bin/bash
payload=$(cat)
tool=$(echo "$payload" | jq -r '.data.tool')
if [ "$tool" = "bash" ]; then
# Add safety flag to all bash commands
modified=$(echo "$payload" | jq '.data.arguments.command = "set -e; " + .data.arguments.command')
echo "{\"action\":\"modify\",\"data\":$(echo "$modified" | jq '.data')}"
fi
exit 0
Example: Block Dangerous Commands
#!/bin/bash
payload=$(cat)
command=$(echo "$payload" | jq -r '.data.arguments.command // ""')
# Block rm -rf /
if echo "$command" | grep -qE 'rm\s+-rf\s+/'; then
echo "Blocked dangerous rm command" >&2
exit 1
fi
exit 0
Hook Discovery
Hooks are discovered from HOOK.md files in these locations (priority order):
- Project-local:
<workspace>/.moltis/hooks/<name>/HOOK.md - User-global:
~/.moltis/hooks/<name>/HOOK.md
Project-local hooks take precedence over global hooks with the same name.
Configuration in moltis.toml
You can also define hooks directly in the config file:
[[hooks]]
name = "audit-log"
command = "./hooks/audit.sh"
events = ["BeforeToolCall", "AfterToolCall"]
timeout = 5
priority = 100 # Higher = runs first
[[hooks]]
name = "notify-slack"
command = "./hooks/slack-notify.sh"
events = ["SessionEnd"]
env = { SLACK_WEBHOOK_URL = "https://hooks.slack.com/..." }
Eligibility Requirements
Hooks can declare requirements that must be met:
[requires]
os = ["darwin", "linux"] # Only run on these OSes
bins = ["jq", "curl"] # Required binaries in PATH
env = ["SLACK_WEBHOOK_URL"] # Required environment variables
If requirements aren’t met, the hook is skipped (not an error).
Circuit Breaker
Hooks that fail repeatedly are automatically disabled:
- Threshold: 5 consecutive failures
- Cooldown: 60 seconds
- Recovery: Auto-re-enabled after cooldown
This prevents a broken hook from blocking all operations.
CLI Commands
# List all discovered hooks
moltis hooks list
# List only eligible hooks (requirements met)
moltis hooks list --eligible
# Output as JSON
moltis hooks list --json
# Show details for a specific hook
moltis hooks info my-hook
Bundled Hooks
Moltis includes several built-in hooks:
boot-md
Reads BOOT.md from the workspace on GatewayStart and injects it into the agent context.
session-memory
Saves session context when you use the /new command, preserving important information for future sessions.
command-logger
Logs all Command events to a JSONL file for auditing.
Example Hooks
Slack Notification on Session End
#!/bin/bash
# slack-notify.sh
payload=$(cat)
session_id=$(echo "$payload" | jq -r '.session_id')
message_count=$(echo "$payload" | jq -r '.data.message_count')
curl -X POST "$SLACK_WEBHOOK_URL" \
-H 'Content-Type: application/json' \
-d "{\"text\":\"Session $session_id ended with $message_count messages\"}"
exit 0
Redact Secrets from Tool Output
#!/bin/bash
# redact-secrets.sh
payload=$(cat)
# Redact common secret patterns
redacted=$(echo "$payload" | sed -E '
s/sk-[a-zA-Z0-9]{32,}/[REDACTED]/g
s/ghp_[a-zA-Z0-9]{36}/[REDACTED]/g
s/password=[^&\s]+/password=[REDACTED]/g
')
echo "{\"action\":\"modify\",\"data\":$(echo "$redacted" | jq '.data')}"
exit 0
Block File Writes Outside Project
#!/bin/bash
# sandbox-writes.sh
payload=$(cat)
tool=$(echo "$payload" | jq -r '.data.tool')
if [ "$tool" = "write_file" ]; then
path=$(echo "$payload" | jq -r '.data.arguments.path')
# Only allow writes under current project
if [[ ! "$path" =~ ^/workspace/ ]]; then
echo "File writes only allowed in /workspace" >&2
exit 1
fi
fi
exit 0
Best Practices
- Keep hooks fast — Set appropriate timeouts (default: 5s)
- Handle errors gracefully — Use
exit 0unless you want to block - Log for debugging — Write to a log file, not stdout
- Test locally first — Pipe sample JSON through your script
- Use jq for JSON — It’s reliable and fast for parsing
Local LLM Support
Moltis can run LLM inference locally on your machine without requiring an API key or internet connection. This enables fully offline operation and keeps your conversations private.
Backends
Moltis supports two backends for local inference:
| Backend | Format | Platform | GPU Acceleration |
|---|---|---|---|
| GGUF (llama.cpp) | .gguf files | macOS, Linux, Windows | Metal (macOS), CUDA (NVIDIA) |
| MLX | MLX model repos | macOS (Apple Silicon only) | Apple Silicon neural engine |
GGUF (llama.cpp)
GGUF is the primary backend, powered by llama.cpp. It supports quantized models in the GGUF format, which significantly reduces memory requirements while maintaining good quality.
Advantages:
- Cross-platform (macOS, Linux, Windows)
- Wide model compatibility (any GGUF model)
- GPU acceleration on both NVIDIA (CUDA) and Apple Silicon (Metal)
- Mature and well-tested
MLX
MLX is Apple’s machine learning framework optimized for Apple Silicon. Models from the mlx-community on HuggingFace are specifically optimized for M1/M2/M3/M4 chips.
Advantages:
- Native Apple Silicon performance
- Efficient unified memory usage
- Lower latency on Macs
Requirements:
- macOS with Apple Silicon (M1/M2/M3/M4)
Memory Requirements
Models are organized by memory tiers based on your system RAM:
| Tier | RAM | Recommended Models |
|---|---|---|
| Tiny | 4GB | Qwen 2.5 Coder 1.5B, Llama 3.2 1B |
| Small | 8GB | Qwen 2.5 Coder 3B, Llama 3.2 3B |
| Medium | 16GB | Qwen 2.5 Coder 7B, Llama 3.1 8B |
| Large | 32GB+ | Qwen 2.5 Coder 14B, DeepSeek Coder V2 Lite |
Moltis automatically detects your system memory and suggests appropriate models in the UI.
Configuration
Via Web UI (Recommended)
- Navigate to Providers in the sidebar
- Click Add Provider
- Select Local LLM
- Choose a model from the registry or search HuggingFace
- Click Configure — the model will download automatically
Via Configuration File
Add to ~/.moltis/moltis.toml:
[providers.local]
model = "qwen2.5-coder-7b-q4_k_m"
For custom GGUF files:
[providers.local]
model = "my-custom-model"
model_path = "/path/to/model.gguf"
Model Storage
Downloaded models are cached in ~/.cache/moltis/models/ by default. This
directory can grow large (several GB per model).
To change the cache location:
[providers.local]
cache_dir = "/custom/models/path"
HuggingFace Integration
You can search and download models directly from HuggingFace:
- In the Add Provider dialog, click “Search HuggingFace”
- Enter a search term (e.g., “qwen coder”)
- Select GGUF or MLX backend
- Choose a model from the results
- The model will be downloaded on first use
Finding GGUF Models
Look for repositories with “GGUF” in the name on HuggingFace:
- TheBloke — large collection of quantized models
- bartowski — Llama 3.x GGUF models
- Qwen — official Qwen GGUF models
Finding MLX Models
MLX models are available from mlx-community:
- Pre-converted models optimized for Apple Silicon
- Look for models ending in
-4bitor-8bitfor quantized versions
GPU Acceleration
Metal (macOS)
Metal acceleration is enabled by default on macOS. The number of GPU layers can be configured:
[providers.local]
gpu_layers = 99 # Offload all layers to GPU
CUDA (NVIDIA)
Requires building with the local-llm-cuda feature:
cargo build --release --features local-llm-cuda
Limitations
Local LLM models have some limitations compared to cloud providers:
-
No tool calling — Local models don’t support function/tool calling. When using a local model, features like file operations, shell commands, and memory search are disabled.
-
Slower inference — Depending on your hardware, local inference may be significantly slower than cloud APIs.
-
Quality varies — Smaller quantized models may produce lower quality responses than larger cloud models.
-
Context window — Local models typically have smaller context windows (8K-32K tokens vs 128K+ for cloud models).
Chat Templates
Different model families use different chat formatting. Moltis automatically detects the correct template for registered models:
- ChatML — Qwen, many instruction-tuned models
- Llama 3 — Meta’s Llama 3.x family
- DeepSeek — DeepSeek Coder models
For custom models, the template is auto-detected from the model metadata when possible.
Troubleshooting
Model fails to load
- Check you have enough RAM (see memory tier table above)
- Verify the GGUF file isn’t corrupted (re-download if needed)
- Ensure the model file matches the expected architecture
Slow inference
- Enable GPU acceleration (Metal on macOS, CUDA on Linux)
- Try a smaller/more quantized model
- Reduce context size in config
Out of memory
- Choose a model from a lower memory tier
- Close other applications to free RAM
- Use a more aggressively quantized model (Q4_K_M vs Q8_0)
Feature Flag
Local LLM support requires the local-llm feature flag at compile time:
cargo build --release --features local-llm
This is enabled by default in release builds.
Sandbox Backends
Moltis runs LLM-generated commands inside containers to protect your host system. The sandbox backend controls which container technology is used.
Backend Selection
Configure in moltis.toml:
[tools.exec.sandbox]
backend = "auto" # default — picks the best available
# backend = "docker" # force Docker
# backend = "apple-container" # force Apple Container (macOS only)
With "auto" (the default), Moltis picks the strongest available backend:
| Priority | Backend | Platform | Isolation |
|---|---|---|---|
| 1 | Apple Container | macOS | VM (Virtualization.framework) |
| 2 | Docker | any | Linux namespaces / cgroups |
| 3 | none (host) | any | no isolation |
Apple Container (recommended on macOS)
Apple Container runs each sandbox in a lightweight virtual machine using Apple’s Virtualization.framework. Every container gets its own kernel, so a kernel exploit inside the sandbox cannot reach the host — unlike Docker, which shares the host kernel.
Install
Download the signed installer from GitHub:
# Download the installer package
gh release download --repo apple/container --pattern "container-installer-signed.pkg" --dir /tmp
# Install (requires admin)
sudo installer -pkg /tmp/container-installer-signed.pkg -target /
# First-time setup — downloads a default Linux kernel
container system start
Alternatively, build from source with brew install container (requires
Xcode 26+).
Verify
container --version
# Run a quick test
container run --rm ubuntu echo "hello from VM"
Once installed, restart moltis gateway — the startup banner will show
sandbox: apple-container backend.
Docker
Docker is supported on macOS, Linux, and Windows. On macOS it runs inside a Linux VM managed by Docker Desktop, so it is reasonably isolated but adds more overhead than Apple Container.
Install from https://docs.docker.com/get-docker/
No sandbox
If neither runtime is found, commands execute directly on the host. The startup banner will show a warning. This is not recommended for untrusted workloads.
Per-session overrides
The web UI allows toggling sandboxing per session and selecting a custom container image. These overrides persist across gateway restarts.
Resource limits
[tools.exec.sandbox.resource_limits]
memory_limit = "512M"
cpu_quota = 1.0
pids_max = 256
Session State
Moltis provides a per-session key-value store that allows skills, extensions, and the agent itself to persist context across messages within a session.
Overview
Session state is scoped to a (session_key, namespace, key) triple, backed by
SQLite. Each entry stores a string value and is automatically timestamped.
The agent accesses state through the session_state tool, which supports three
operations: get, set, and list.
Agent Tool
The session_state tool is registered as a built-in tool and available in every
session.
Get a value
{
"op": "get",
"namespace": "my-skill",
"key": "last_query"
}
Set a value
{
"op": "set",
"namespace": "my-skill",
"key": "last_query",
"value": "SELECT * FROM users"
}
List all keys in a namespace
{
"op": "list",
"namespace": "my-skill"
}
Namespacing
Every state entry belongs to a namespace. This prevents collisions between different skills or extensions using state in the same session. Use your skill name as the namespace.
Storage
State is stored in the session_state table in the main SQLite database
(moltis.db). The migration is in
crates/sessions/migrations/20260205120000_session_state.sql.
State values are strings. To store structured data, serialize to JSON before writing and parse after reading.
Session Branching
Session branching (forking) lets you create an independent copy of a conversation at any point. The new session diverges without affecting the original — useful for exploring alternative approaches, running “what if” scenarios, or preserving a checkpoint before a risky prompt.
Forking from the UI
There are two ways to fork a session in the web UI:
- Chat header — click the Fork button in the header bar (next to Delete). This is visible for every session except cron sessions.
- Sidebar — hover over a session in the sidebar and click the fork icon that appears in the action buttons.
Both create a new session that copies all messages from the current one and immediately switch you to it.
Forked sessions appear indented under their parent in the sidebar, with a
branch icon to distinguish them from top-level sessions. The metadata line
shows fork@N where N is the message index at which the fork occurred.
Agent Tool
The agent can also fork programmatically using the branch_session tool:
{
"at_message": 5,
"label": "explore-alternative"
}
at_message— the message index to fork at (messages 0..N are copied). If omitted, all messages are copied.label— optional human-readable label for the new session.
The tool returns the new session key.
RPC Method
The sessions.fork RPC method is the underlying mechanism:
{ "key": "main", "at_message": 5, "label": "my-fork" }
On success the response payload contains { "sessionKey": "session:<uuid>" }.
What Gets Inherited
When forking, the new session inherits:
| Inherited | Not inherited |
|---|---|
| Messages (up to fork point) | Worktree branch |
| Model selection | Sandbox settings |
| Project assignment | Channel binding |
| MCP disabled flag |
Parent-Child Relationships
Fork relationships are stored directly on the sessions table:
parent_session_key— the key of the session this was forked from.fork_point— the message index where the fork occurred.
These fields drive the tree rendering in the sidebar. Sessions with a parent appear indented under it; deeply nested forks indent further.
Deleting a parent session does not cascade to its children. Child sessions become top-level sessions — they keep their messages and history but lose their visual nesting in the sidebar.
Navigation After Delete
When you delete a forked session, the UI navigates back to its parent session.
If the deleted session had no parent (or the parent no longer exists), it falls
back to the next sibling or main.
A forked session is fully independent after creation. Changes to the parent do not propagate to the fork, and vice versa.
Skill Self-Extension
Moltis can create, update, and delete skills at runtime through agent tools, enabling the system to extend its own capabilities during a conversation.
Overview
Three agent tools manage project-local skills:
| Tool | Description |
|---|---|
create_skill | Write a new SKILL.md to .moltis/skills/<name>/ |
update_skill | Overwrite an existing skill’s SKILL.md |
delete_skill | Remove a skill directory |
Skills created this way are project-local and stored in the working directory’s
.moltis/skills/ folder. They become available on the next message
automatically thanks to the skill watcher.
Skill Watcher
The skill watcher (crates/skills/src/watcher.rs) monitors skill directories
for filesystem changes using debounced notifications. When a SKILL.md file is
created, modified, or deleted, the watcher emits a skills.changed event via
the WebSocket event bus so the UI can refresh.
The watcher uses debouncing to avoid firing multiple events for rapid successive edits (e.g. an editor writing a temp file then renaming).
Creating a Skill
The agent can create a skill by calling the create_skill tool:
{
"name": "summarize-pr",
"content": "# summarize-pr\n\nSummarize a GitHub pull request...",
"description": "Summarize GitHub PRs with key changes and review notes"
}
This writes .moltis/skills/summarize-pr/SKILL.md with the provided content.
The skill discoverer picks it up on the next message.
Updating a Skill
{
"name": "summarize-pr",
"content": "# summarize-pr\n\nUpdated instructions..."
}
Deleting a Skill
{
"name": "summarize-pr"
}
This removes the entire .moltis/skills/summarize-pr/ directory.
Deleted skills cannot be recovered. The agent should confirm with the user before deleting a skill.
Mobile PWA and Push Notifications
Moltis can be installed as a Progressive Web App (PWA) on mobile devices, providing a native app-like experience with push notifications.
Installing on Mobile
iOS (Safari)
- Open moltis in Safari
- Tap the Share button (box with arrow)
- Scroll down and tap “Add to Home Screen”
- Tap “Add” to confirm
The app will appear on your home screen with the moltis icon.
Android (Chrome)
- Open moltis in Chrome
- You should see an install banner at the bottom - tap “Install”
- Or tap the three-dot menu and select “Install app” or “Add to Home Screen”
- Tap “Install” to confirm
The app will appear in your app drawer and home screen.
PWA Features
When installed as a PWA, moltis provides:
- Standalone mode: Full-screen experience without browser UI
- Offline support: Previously loaded content remains accessible
- Fast loading: Assets are cached locally
- Home screen icon: Quick access from your device’s home screen
- Safe area support: Proper spacing for notched devices (iPhone X+)
Push Notifications
Push notifications allow you to receive alerts when the LLM responds, even when you’re not actively viewing the app.
Enabling Push Notifications
- Open the moltis app (must be installed as PWA on Safari/iOS)
- Go to Settings > Notifications
- Click Enable to subscribe to push notifications
- When prompted, allow notification permissions
Safari/iOS Note: Push notifications only work when the app is installed as a PWA. If you see “Installation required”, add moltis to your Dock first:
- macOS: File → Add to Dock
- iOS: Share → Add to Home Screen
Managing Subscriptions
The Settings > Notifications page shows all subscribed devices:
- Device name: Parsed from user agent (e.g., “Safari on macOS”, “iPhone”)
- IP address: Client IP at subscription time (supports proxies via X-Forwarded-For)
- Subscription date: When the device subscribed
You can remove any subscription by clicking the Remove button. This works from any device - useful for revoking access to old devices.
Subscription changes are broadcast in real-time via WebSocket, so all connected clients see updates immediately.
How It Works
Moltis uses the Web Push API with VAPID (Voluntary Application Server Identification) keys:
- VAPID Keys: On first run, the server generates a P-256 ECDSA key pair
- Subscription: The browser creates a push subscription using the server’s public key
- Registration: The subscription details are sent to the server and stored
- Notification: When you need to be notified, the server encrypts and sends a push message
Push API Routes
The gateway exposes these API endpoints for push notifications:
| Endpoint | Method | Description |
|---|---|---|
/api/push/vapid-key | GET | Get the VAPID public key for subscription |
/api/push/subscribe | POST | Register a push subscription |
/api/push/unsubscribe | POST | Remove a push subscription |
/api/push/status | GET | Get push service status and subscription list |
Subscribe Request
{
"endpoint": "https://fcm.googleapis.com/fcm/send/...",
"keys": {
"p256dh": "base64url-encoded-key",
"auth": "base64url-encoded-auth"
}
}
Status Response
{
"enabled": true,
"subscription_count": 2,
"subscriptions": [
{
"endpoint": "https://fcm.googleapis.com/...",
"device": "Safari on macOS",
"ip": "192.168.1.100",
"created_at": "2025-02-05T23:30:00Z"
}
]
}
Notification Payload
Push notifications include:
{
"title": "moltis",
"body": "New response available",
"url": "/chats",
"sessionKey": "session-id"
}
Clicking a notification will open or focus the app and navigate to the relevant chat.
Configuration
Feature Flag
Push notifications are controlled by the push-notifications feature flag, which is enabled by default. To disable:
# In your Cargo.toml or when building
[dependencies]
moltis-gateway = { default-features = false, features = ["web-ui", "tls"] }
Or build without the feature:
cargo build --no-default-features --features web-ui,tls,tailscale,file-watcher
Data Storage
Push notification data is stored in push.json in the data directory:
- VAPID keys: Generated once and reused
- Subscriptions: List of all registered browser subscriptions
The VAPID keys are persisted so subscriptions remain valid across restarts.
Mobile UI Considerations
The mobile interface adapts for smaller screens:
- Navigation drawer: The sidebar becomes a slide-out drawer on mobile
- Sessions panel: Displayed as a bottom sheet that can be swiped
- Touch targets: Minimum 44px touch targets for accessibility
- Safe areas: Proper insets for devices with notches or home indicators
Responsive Breakpoints
- Mobile: < 768px width (drawer navigation)
- Desktop: ≥ 768px width (sidebar navigation)
Browser Support
| Feature | Chrome | Safari | Firefox | Edge |
|---|---|---|---|---|
| PWA Install | ✅ | ✅ (iOS) | ❌ | ✅ |
| Push Notifications | ✅ | ✅ (iOS 16.4+) | ✅ | ✅ |
| Service Worker | ✅ | ✅ | ✅ | ✅ |
| Offline Support | ✅ | ✅ | ✅ | ✅ |
Note: iOS push notifications require iOS 16.4 or later and the app must be installed as a PWA.
Troubleshooting
Notifications Not Working
- Check permissions: Ensure notifications are allowed in browser/OS settings
- Check subscription: Go to Settings > Notifications to see if your device is listed
- Check server logs: Look for
push:prefixed log messages for delivery status - Safari/iOS specific:
- Must be installed as PWA (Add to Dock/Home Screen)
- iOS requires version 16.4 or later
- The Enable button is disabled until installed as PWA
- Behind a proxy: Ensure your proxy forwards
X-Forwarded-FororX-Real-IPheaders
PWA Not Installing
- HTTPS required: PWAs require a secure connection (or localhost)
- Valid manifest: Ensure
/manifest.jsonloads correctly - Service worker: Check that
/sw.jsregisters without errors - Clear cache: Try clearing browser cache and reloading
Service Worker Issues
Clear the service worker registration:
- Open browser DevTools
- Go to Application > Service Workers
- Click “Unregister” on the moltis service worker
- Reload the page
Security Architecture
Moltis is designed with a defense-in-depth security model. This document explains the key security features and provides guidance for production deployments.
Overview
Moltis runs AI agents that can execute code and interact with external systems. This power requires multiple layers of protection:
- Human-in-the-loop approval for dangerous commands
- Sandbox isolation for command execution
- Channel authorization for external integrations
- Rate limiting to prevent resource abuse
- Scope-based access control for API authorization
Command Execution Approval
By default, Moltis requires explicit user approval before executing potentially dangerous commands. This “human-in-the-loop” design ensures the AI cannot take destructive actions without consent.
How It Works
When the agent wants to run a command:
- The command is analyzed against approval policies
- If approval is required, the user sees a prompt in the UI
- The user can approve, deny, or modify the command
- Only approved commands execute
Approval Policies
Configure approval behavior in moltis.toml:
[tools.exec]
approval_mode = "always" # always require approval
# approval_mode = "smart" # auto-approve safe commands (default)
# approval_mode = "never" # dangerous: never require approval
Recommendation: Keep approval_mode = "smart" (the default) for most use
cases. Only use "never" in fully automated, sandboxed environments.
Sandbox Isolation
Commands execute inside isolated containers (Docker or Apple Container) by default. This protects your host system from:
- Accidental file deletion or modification
- Malicious code execution
- Resource exhaustion (memory, CPU, disk)
See sandbox.md for backend configuration.
Resource Limits
[tools.exec.sandbox.resource_limits]
memory_limit = "512M"
cpu_quota = 1.0
pids_max = 256
Network Isolation
Sandbox containers have limited network access by default. Outbound connections are allowed but the sandbox cannot bind to host ports.
Channel Authorization
Channels (Telegram, Slack, etc.) allow external parties to interact with your Moltis agent. This requires careful access control.
Sender Allowlisting
When a new sender contacts the agent through a channel, they are placed in a pending queue. You must explicitly approve or deny each sender before they can interact with the agent.
UI: Settings > Channels > Pending Senders
Per-Channel Permissions
Each channel can have different permission levels:
- Read-only: Sender can ask questions, agent responds
- Execute: Sender can trigger actions (with approval still required)
- Admin: Full access including configuration changes
Channel Isolation
Channels run in isolated sessions by default. A malicious message from one channel cannot affect another channel’s session or the main UI session.
Cron Job Security
Scheduled tasks (cron jobs) can run agent turns automatically. Security considerations:
Rate Limiting
To prevent prompt injection attacks from rapidly creating many cron jobs:
[cron]
rate_limit_max = 10 # max jobs per window
rate_limit_window_secs = 60 # window duration (1 minute)
This limits job creation to 10 per minute by default. System jobs (like heartbeat) bypass this limit.
Job Notifications
When cron jobs are created, updated, or removed, Moltis broadcasts events:
cron.job.created- A new job was createdcron.job.updated- An existing job was modifiedcron.job.removed- A job was deleted
Monitor these events to detect suspicious automated job creation.
Sandbox for Cron Jobs
Cron job execution uses sandbox isolation by default:
# Per-job configuration
[cron.job.sandbox]
enabled = true # run in sandbox (default)
# image = "custom:latest" # optional custom image
Identity Protection
The agent’s identity (name, personality “soul”) is stored in moltis.toml.
Modifying identity requires the operator.write scope, not just operator.read.
This prevents prompt injection attacks from subtly modifying the agent’s personality to make it more compliant with malicious requests.
API Authorization
The gateway API uses role-based access control with scopes:
| Scope | Permissions |
|---|---|
operator.read | View status, list jobs, read history |
operator.write | Send messages, create jobs, modify configuration |
operator.admin | All permissions (includes all other scopes) |
operator.approvals | Handle command approval requests |
operator.pairing | Manage device/node pairing |
API Keys
API keys authenticate external tools and scripts connecting to Moltis. Keys can have full access (all scopes) or be restricted to specific scopes for defense-in-depth.
Creating API Keys
Web UI: Settings > Security > API Keys
- Enter a label describing the key’s purpose
- Choose “Full access” or select specific scopes
- Click “Generate key”
- Copy the key immediately — it’s only shown once
CLI:
# Full access key
moltis auth create-api-key --label "CI pipeline"
# Scoped key (comma-separated scopes)
moltis auth create-api-key --label "Monitor" --scopes "operator.read"
moltis auth create-api-key --label "Automation" --scopes "operator.read,operator.write"
Using API Keys
Pass the key in the connect handshake over WebSocket:
{
"method": "connect",
"params": {
"client": { "id": "my-tool", "version": "1.0.0" },
"auth": { "api_key": "mk_abc123..." }
}
}
Or use Bearer authentication for REST API calls:
Authorization: Bearer mk_abc123...
Scope Recommendations
| Use Case | Recommended Scopes |
|---|---|
| Read-only monitoring | operator.read |
| Automated workflows | operator.read, operator.write |
| Approval handling | operator.read, operator.approvals |
| Full automation | Full access (no scope restrictions) |
Best practice: Use the minimum necessary scopes. If a key only needs to
read status and logs, don’t grant operator.write.
Backward Compatibility
Existing API keys (created before scopes were added) have full access. Newly created keys without explicit scopes also have full access.
Network Security
TLS Encryption
HTTPS is enabled by default with auto-generated certificates:
[tls]
enabled = true
auto_generate = true
For production, use certificates from a trusted CA or configure custom certificates.
Origin Validation
WebSocket connections validate the Origin header to prevent cross-site
WebSocket hijacking (CSWSH). Connections from untrusted origins are rejected.
SSRF Protection
The web_fetch tool resolves DNS and blocks requests to private IP ranges
(loopback, RFC 1918, link-local, CGNAT). This prevents server-side request
forgery attacks.
Production Recommendations
1. Enable Authentication
By default, Moltis requires a password when accessed from non-localhost:
[auth]
disabled = false # keep this false in production
2. Use Sandbox Isolation
Always run with sandbox enabled in production:
[tools.exec.sandbox]
enabled = true
backend = "auto" # uses strongest available
3. Limit Rate Limits
Tighten rate limits for untrusted environments:
[cron]
rate_limit_max = 5
rate_limit_window_secs = 300 # 5 per 5 minutes
4. Review Channel Senders
Regularly audit approved senders and revoke access for unknown parties.
5. Monitor Events
Watch for these suspicious patterns:
- Rapid cron job creation
- Identity modification attempts
- Unusual command patterns in approval requests
- New channel senders from unexpected sources
6. Network Segmentation
Run Moltis on a private network or behind a reverse proxy with:
- IP allowlisting
- Rate limiting
- Web Application Firewall (WAF) rules
7. Keep Software Updated
Subscribe to security advisories and update promptly when vulnerabilities are disclosed.
Reporting Security Issues
Report security vulnerabilities privately to the maintainers. Do not open public issues for security bugs.
See the repository’s SECURITY.md for contact information.
Running Moltis in Docker
Moltis is available as a multi-architecture Docker image supporting both
linux/amd64 and linux/arm64. The image is published to GitHub Container
Registry on every release.
Quick Start
docker run -d \
--name moltis \
-p 13131:13131 \
-v moltis-config:/home/moltis/.config/moltis \
-v moltis-data:/home/moltis/.moltis \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/penso/moltis:latest
Open http://localhost:13131 in your browser and configure your LLM provider to start chatting.
When accessing from localhost, no authentication is required. If you access Moltis from a different machine (e.g., over the network), a setup code is printed to the container logs for authentication setup:
```bash docker logs moltis ```
Volume Mounts
Moltis uses two directories that should be persisted:
| Path | Contents |
|---|---|
/home/moltis/.config/moltis | Configuration files: moltis.toml, credentials.json, mcp-servers.json |
/home/moltis/.moltis | Runtime data: databases, sessions, memory files, logs |
You can use named volumes (as shown above) or bind mounts to local directories for easier access to configuration files:
docker run -d \
--name moltis \
-p 13131:13131 \
-v ./config:/home/moltis/.config/moltis \
-v ./data:/home/moltis/.moltis \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/penso/moltis:latest
With bind mounts, you can edit config/moltis.toml directly on the host.
Docker Socket (Sandbox Execution)
Moltis runs LLM-generated shell commands inside isolated containers for security. When Moltis itself runs in a container, it needs access to the host’s container runtime to create these sandbox containers.
Without the socket mount, sandbox execution is disabled. The agent will still work for chat-only interactions, but any tool that runs shell commands will fail.
# Required for sandbox execution
-v /var/run/docker.sock:/var/run/docker.sock
Security Consideration
Mounting the Docker socket gives the container full access to the Docker
daemon. This is equivalent to root access on the host for practical purposes.
Only run Moltis containers from trusted sources (official images from
ghcr.io/penso/moltis).
If you cannot mount the Docker socket, Moltis will run in “no sandbox” mode — commands execute directly inside the Moltis container itself, which provides no isolation.
Docker Compose
See examples/docker-compose.yml for a
complete example:
services:
moltis:
image: ghcr.io/penso/moltis:latest
container_name: moltis
restart: unless-stopped
ports:
- "13131:13131"
volumes:
- ./config:/home/moltis/.config/moltis
- ./data:/home/moltis/.moltis
- /var/run/docker.sock:/var/run/docker.sock
Start with:
docker compose up -d
docker compose logs -f moltis # watch for startup messages
Podman Support
Moltis works with Podman using its Docker-compatible API. Mount the Podman socket instead of the Docker socket:
# Podman rootless
podman run -d \
--name moltis \
-p 13131:13131 \
-v moltis-config:/home/moltis/.config/moltis \
-v moltis-data:/home/moltis/.moltis \
-v /run/user/$(id -u)/podman/podman.sock:/var/run/docker.sock \
ghcr.io/penso/moltis:latest
# Podman rootful
podman run -d \
--name moltis \
-p 13131:13131 \
-v moltis-config:/home/moltis/.config/moltis \
-v moltis-data:/home/moltis/.moltis \
-v /run/podman/podman.sock:/var/run/docker.sock \
ghcr.io/penso/moltis:latest
You may need to enable the Podman socket service first:
# Rootless
systemctl --user enable --now podman.socket
# Rootful
sudo systemctl enable --now podman.socket
Environment Variables
| Variable | Description |
|---|---|
MOLTIS_CONFIG_DIR | Override config directory (default: ~/.config/moltis) |
MOLTIS_DATA_DIR | Override data directory (default: ~/.moltis) |
Example:
docker run -d \
--name moltis \
-p 13131:13131 \
-e MOLTIS_CONFIG_DIR=/config \
-e MOLTIS_DATA_DIR=/data \
-v ./config:/config \
-v ./data:/data \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/penso/moltis:latest
Building Locally
To build the Docker image from source:
# Single architecture (current platform)
docker build -t moltis:local .
# Multi-architecture (requires buildx)
docker buildx build --platform linux/amd64,linux/arm64 -t moltis:local .
OrbStack
OrbStack on macOS works identically to Docker — use the same socket path
(/var/run/docker.sock). OrbStack’s lightweight Linux VM provides good
isolation with lower resource usage than Docker Desktop.
Troubleshooting
“Cannot connect to Docker daemon”
The Docker socket is not mounted or the Moltis user doesn’t have permission to access it. Verify:
docker exec moltis ls -la /var/run/docker.sock
Setup code not appearing in logs (for network access)
The setup code only appears when accessing from a non-localhost address. If you’re accessing from the same machine via localhost, no setup code is needed. For network access, wait a few seconds for the gateway to start, then check logs:
docker logs moltis 2>&1 | grep -i setup
Permission denied on bind mounts
When using bind mounts, ensure the directories exist and are writable:
mkdir -p ./config ./data
chmod 755 ./config ./data
The container runs as user moltis (UID 1000). If you see permission errors,
you may need to adjust ownership:
sudo chown -R 1000:1000 ./config ./data
Streaming Architecture
This document explains how streaming responses work in Moltis, from the LLM provider through to the web UI.
Overview
Moltis supports real-time token streaming for LLM responses, providing a much better user experience than waiting for the complete response. Streaming works even when tools are enabled, allowing users to see text as it arrives while tool calls are accumulated and executed.
Components
1. StreamEvent Enum (crates/agents/src/model.rs)
The StreamEvent enum defines all events that can occur during a streaming
LLM response:
#![allow(unused)] fn main() { pub enum StreamEvent { /// Text content delta - a chunk of text from the LLM. Delta(String), /// A tool call has started (for providers with native tool support). ToolCallStart { id: String, name: String, index: usize }, /// Streaming delta for tool call arguments (JSON fragment). ToolCallArgumentsDelta { index: usize, delta: String }, /// A tool call's arguments are complete. ToolCallComplete { index: usize }, /// Stream completed successfully with token usage. Done(Usage), /// An error occurred. Error(String), } }
2. LlmProvider Trait (crates/agents/src/model.rs)
The LlmProvider trait defines two streaming methods:
stream()- Basic streaming without tool supportstream_with_tools()- Streaming with tool schemas passed to the API
Providers that support streaming with tools (like Anthropic) override
stream_with_tools(). Others fall back to stream() which ignores the tools
parameter.
3. Anthropic Provider (crates/agents/src/providers/anthropic.rs)
The Anthropic provider implements streaming by:
- Making a POST request to
/v1/messageswith"stream": true - Reading Server-Sent Events (SSE) from the response
- Parsing events and yielding appropriate
StreamEventvariants:
| SSE Event Type | StreamEvent |
|---|---|
content_block_start (text) | (none, just tracking) |
content_block_start (tool_use) | ToolCallStart |
content_block_delta (text_delta) | Delta |
content_block_delta (input_json_delta) | ToolCallArgumentsDelta |
content_block_stop | ToolCallComplete (for tool blocks) |
message_delta | (usage tracking) |
message_stop | Done |
error | Error |
4. Agent Runner (crates/agents/src/runner.rs)
The run_agent_loop_streaming() function orchestrates the streaming agent
loop:
┌─────────────────────────────────────────────────────────┐
│ Agent Loop │
│ │
│ 1. Call provider.stream_with_tools() │
│ │
│ 2. While stream has events: │
│ ├─ Delta(text) → emit RunnerEvent::TextDelta │
│ ├─ ToolCallStart → accumulate tool call │
│ ├─ ToolCallArgumentsDelta → accumulate args │
│ ├─ ToolCallComplete → finalize args │
│ ├─ Done → record usage │
│ └─ Error → return error │
│ │
│ 3. If no tool calls → return accumulated text │
│ │
│ 4. Execute tool calls concurrently │
│ ├─ Emit ToolCallStart events │
│ ├─ Run tools in parallel │
│ └─ Emit ToolCallEnd events │
│ │
│ 5. Append tool results to messages │
│ │
│ 6. Loop back to step 1 │
└─────────────────────────────────────────────────────────┘
5. Gateway (crates/gateway/src/chat.rs)
The gateway’s run_with_tools() function:
- Sets up an event callback that broadcasts
RunnerEvents via WebSocket - Calls
run_agent_loop_streaming() - Broadcasts events to connected clients as JSON frames
Event types broadcast to the UI:
| RunnerEvent | WebSocket State |
|---|---|
Thinking | thinking |
ThinkingDone | thinking_done |
TextDelta(text) | delta with text field |
ToolCallStart | tool_call_start |
ToolCallEnd | tool_call_end |
Iteration(n) | iteration |
6. Frontend (crates/gateway/src/assets/js/)
The JavaScript frontend handles streaming via WebSocket:
- websocket.js - Receives WebSocket frames and dispatches to handlers
- events.js - Event bus for distributing events to components
- state.js - Manages streaming state (
streamText,streamEl)
When a delta event arrives:
function handleChatDelta(p, isActive, isChatPage) {
if (!(p.text && isActive && isChatPage)) return;
removeThinking();
if (!S.streamEl) {
S.setStreamText("");
S.setStreamEl(document.createElement("div"));
S.streamEl.className = "msg assistant";
S.chatMsgBox.appendChild(S.streamEl);
}
S.setStreamText(S.streamText + p.text);
setSafeMarkdownHtml(S.streamEl, S.streamText);
S.chatMsgBox.scrollTop = S.chatMsgBox.scrollHeight;
}
Data Flow
┌──────────────┐ SSE ┌──────────────┐ StreamEvent ┌──────────────┐
│ Anthropic │─────────────▶│ Provider │────────────────▶│ Runner │
│ API │ │ │ │ │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
RunnerEvent
│
▼
┌──────────────┐ WebSocket ┌──────────────┐ Callback ┌──────────────┐
│ Browser │◀─────────────│ Gateway │◀────────────────│ Callback │
│ │ │ │ │ (on_event) │
└──────────────┘ └──────────────┘ └──────────────┘
Adding Streaming to New Providers
To add streaming support for a new LLM provider:
- Implement the
stream()method (basic streaming) - If the provider supports tools in streaming mode, override
stream_with_tools() - Parse the provider’s streaming format and yield appropriate
StreamEventvariants - Handle errors gracefully with
StreamEvent::Error - Always emit
StreamEvent::Donewith usage statistics when complete
Example skeleton:
#![allow(unused)] fn main() { fn stream_with_tools( &self, messages: Vec<serde_json::Value>, tools: Vec<serde_json::Value>, ) -> Pin<Box<dyn Stream<Item = StreamEvent> + Send + '_>> { Box::pin(async_stream::stream! { // Make streaming request to provider API let resp = self.client.post(...) .json(&body) .send() .await?; // Read SSE or streaming response let mut byte_stream = resp.bytes_stream(); while let Some(chunk) = byte_stream.next().await { // Parse chunk and yield events match parse_event(&chunk) { TextDelta(text) => yield StreamEvent::Delta(text), ToolStart { id, name, idx } => { yield StreamEvent::ToolCallStart { id, name, index: idx } } // ... handle other event types } } yield StreamEvent::Done(usage); }) } }
Performance Considerations
- Unbounded channels: WebSocket send channels are unbounded, so slow clients can accumulate messages in memory
- Markdown re-rendering: The frontend re-renders full markdown on each delta, which is O(n) work per delta. For very long responses, this can cause UI lag
- Concurrent tool execution: Multiple tool calls are executed in parallel
using
futures::join_all(), improving throughput when the LLM requests several tools at once
SQLite Database Migrations
Moltis uses sqlx for database access and its built-in migration system for schema management. Each crate owns its migrations, keeping schema definitions close to the code that uses them.
Architecture
Each crate that uses SQLite has its own migrations/ directory and exposes a
run_migrations() function. The gateway orchestrates running all migrations at
startup in the correct dependency order.
crates/
├── projects/
│ ├── migrations/
│ │ └── 20240205100000_init.sql # projects table
│ └── src/lib.rs # run_migrations()
├── sessions/
│ ├── migrations/
│ │ └── 20240205100001_init.sql # sessions, channel_sessions
│ └── src/lib.rs # run_migrations()
├── cron/
│ ├── migrations/
│ │ └── 20240205100002_init.sql # cron_jobs, cron_runs
│ └── src/lib.rs # run_migrations()
├── gateway/
│ ├── migrations/
│ │ └── 20240205100003_init.sql # auth, message_log, channels
│ └── src/server.rs # orchestrates moltis.db migrations
└── memory/
├── migrations/
│ └── 20240205100004_init.sql # files, chunks, embedding_cache, FTS
└── src/lib.rs # run_migrations() (separate memory.db)
How It Works
Migration Ownership
Each crate is autonomous and owns its schema:
| Crate | Database | Tables | Migration File |
|---|---|---|---|
moltis-projects | moltis.db | projects | 20240205100000_init.sql |
moltis-sessions | moltis.db | sessions, channel_sessions | 20240205100001_init.sql |
moltis-cron | moltis.db | cron_jobs, cron_runs | 20240205100002_init.sql |
moltis-gateway | moltis.db | auth_*, passkeys, api_keys, env_variables, message_log, channels | 20240205100003_init.sql |
moltis-memory | memory.db | files, chunks, embedding_cache, chunks_fts | 20240205100004_init.sql |
Startup Sequence
The gateway runs migrations in dependency order:
#![allow(unused)] fn main() { // server.rs moltis_projects::run_migrations(&db_pool).await?; // 1. projects first moltis_sessions::run_migrations(&db_pool).await?; // 2. sessions (FK → projects) moltis_cron::run_migrations(&db_pool).await?; // 3. cron (independent) sqlx::migrate!("./migrations").run(&db_pool).await?; // 4. gateway tables }
Sessions depends on projects due to a foreign key (sessions.project_id references
projects.id), so projects must migrate first.
Version Tracking
sqlx tracks applied migrations in the _sqlx_migrations table:
SELECT version, description, installed_on, success FROM _sqlx_migrations;
Migrations are identified by their timestamp prefix (e.g., 20240205100000), which
must be globally unique across all crates.
Database Files
| Database | Location | Crates |
|---|---|---|
moltis.db | ~/.moltis/moltis.db | projects, sessions, cron, gateway |
memory.db | ~/.moltis/memory.db | memory (separate, managed internally) |
Adding New Migrations
Adding a Column to an Existing Table
- Create a new migration file in the owning crate:
# Example: adding tags to sessions
touch crates/sessions/migrations/20240301120000_add_tags.sql
- Write the migration SQL:
-- 20240301120000_add_tags.sql
ALTER TABLE sessions ADD COLUMN tags TEXT;
CREATE INDEX IF NOT EXISTS idx_sessions_tags ON sessions(tags);
- Rebuild to embed the migration:
cargo build
Adding a New Table to an Existing Crate
- Create the migration file with a new timestamp:
touch crates/sessions/migrations/20240302100000_session_bookmarks.sql
- Write the CREATE TABLE statement:
-- 20240302100000_session_bookmarks.sql
CREATE TABLE IF NOT EXISTS session_bookmarks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_key TEXT NOT NULL,
name TEXT NOT NULL,
message_id INTEGER NOT NULL,
created_at INTEGER NOT NULL
);
Adding Tables to a New Crate
- Create the migrations directory:
mkdir -p crates/new-feature/migrations
- Create the migration file with a globally unique timestamp:
touch crates/new-feature/migrations/20240401100000_init.sql
- Add
run_migrations()to the crate’slib.rs:
#![allow(unused)] fn main() { pub async fn run_migrations(pool: &sqlx::SqlitePool) -> anyhow::Result<()> { sqlx::migrate!("./migrations").run(pool).await?; Ok(()) } }
- Call it from
server.rsin the appropriate order:
#![allow(unused)] fn main() { moltis_new_feature::run_migrations(&db_pool).await?; }
Timestamp Convention
Use YYYYMMDDHHMMSS format for migration filenames:
YYYY- 4-digit yearMM- 2-digit monthDD- 2-digit dayHH- 2-digit hour (24h)MM- 2-digit minuteSS- 2-digit second
This ensures global uniqueness across crates. When adding migrations, use the current timestamp to avoid collisions.
SQLite Limitations
ALTER TABLE
SQLite has limited ALTER TABLE support:
- ADD COLUMN: Supported ✓
- DROP COLUMN: SQLite 3.35+ only
- Rename column: Requires table recreation
- Change column type: Requires table recreation
For complex schema changes, use the table recreation pattern:
-- Create new table with desired schema
CREATE TABLE sessions_new (
-- new schema
);
-- Copy data (map old columns to new)
INSERT INTO sessions_new SELECT ... FROM sessions;
-- Swap tables
DROP TABLE sessions;
ALTER TABLE sessions_new RENAME TO sessions;
-- Recreate indexes
CREATE INDEX idx_sessions_created_at ON sessions(created_at);
Foreign Keys
SQLite foreign keys are checked at insert/update time, not migration time. Ensure migrations run in dependency order (parent table first).
Testing
Unit tests use in-memory databases with the crate’s init() method:
#![allow(unused)] fn main() { #[tokio::test] async fn test_session_operations() { let pool = SqlitePool::connect("sqlite::memory:").await.unwrap(); // Create schema for tests (init() retained for this purpose) SqliteSessionMetadata::init(&pool).await.unwrap(); let meta = SqliteSessionMetadata::new(pool); // ... test code } }
The init() methods are retained (marked #[doc(hidden)]) specifically for tests.
In production, migrations handle schema creation.
Troubleshooting
“failed to run migrations”
- Check file permissions on
~/.moltis/ - Ensure the database file isn’t locked by another process
- Check for syntax errors in migration SQL files
Migration Order Issues
If you see foreign key errors, verify the migration order in server.rs. Parent
tables must be created before child tables with FK references.
Checking Migration Status
sqlite3 ~/.moltis/moltis.db "SELECT version, description, success FROM _sqlx_migrations ORDER BY version"
Resetting Migrations (Development Only)
# Backup first!
rm ~/.moltis/moltis.db
cargo run # Creates fresh database with all migrations
Best Practices
DO
- Use timestamp-based version numbers for global uniqueness
- Keep each crate’s migrations in its own directory
- Use
IF NOT EXISTSfor idempotent initial migrations - Test migrations on a copy of production data before deploying
- Keep migrations small and focused
DON’T
- Modify existing migration files after deployment
- Reuse timestamps across crates
- Put multiple crates’ tables in one migration file
- Skip the dependency order in
server.rs
Metrics and Tracing
Moltis includes comprehensive observability support through Prometheus metrics and tracing integration. This document explains how to enable, configure, and use these features.
Overview
The metrics system is built on the metrics crate
facade, which provides a unified interface similar to the log crate. When the
prometheus feature is enabled, metrics are exported in Prometheus text format
for scraping by Grafana, Prometheus, or other monitoring tools.
All metrics are feature-gated — they add zero overhead when disabled.
Feature Flags
Metrics are controlled by two feature flags:
| Feature | Description | Default |
|---|---|---|
metrics | Enables metrics collection and the /api/metrics JSON API | Enabled |
prometheus | Enables the /metrics Prometheus endpoint (requires metrics) | Enabled |
Compile-Time Configuration
# Enable only metrics collection (no Prometheus endpoint)
moltis-gateway = { version = "0.1", features = ["metrics"] }
# Enable metrics with Prometheus export (default)
moltis-gateway = { version = "0.1", features = ["metrics", "prometheus"] }
# Enable metrics for specific crates
moltis-agents = { version = "0.1", features = ["metrics"] }
moltis-cron = { version = "0.1", features = ["metrics"] }
To build without metrics entirely:
cargo build --release --no-default-features --features "file-watcher,tailscale,tls,web-ui"
Prometheus Endpoint
When the prometheus feature is enabled, the gateway exposes a /metrics endpoint:
GET http://localhost:18789/metrics
This endpoint is unauthenticated to allow Prometheus scrapers to access it. It returns metrics in Prometheus text format:
# HELP moltis_http_requests_total Total number of HTTP requests handled
# TYPE moltis_http_requests_total counter
moltis_http_requests_total{method="GET",status="200",endpoint="/api/chat"} 42
# HELP moltis_llm_completion_duration_seconds Duration of LLM completion requests
# TYPE moltis_llm_completion_duration_seconds histogram
moltis_llm_completion_duration_seconds_bucket{provider="anthropic",model="claude-3-opus",le="1.0"} 5
Grafana Integration
To scrape metrics with Prometheus and visualize in Grafana:
- Add moltis to your
prometheus.yml:
scrape_configs:
- job_name: 'moltis'
static_configs:
- targets: ['localhost:18789']
metrics_path: /metrics
scrape_interval: 15s
- Import or create Grafana dashboards using the
moltis_*metrics.
JSON API Endpoints
For the web UI dashboard and programmatic access, authenticated JSON endpoints are available:
| Endpoint | Description |
|---|---|
GET /api/metrics | Full metrics snapshot with aggregates and per-provider breakdown |
GET /api/metrics/summary | Lightweight counts for navigation badges |
GET /api/metrics/history | Time-series data points for charts (last hour, 10s intervals) |
History Endpoint
The /api/metrics/history endpoint returns historical metrics data for rendering
time-series charts:
{
"enabled": true,
"interval_seconds": 10,
"max_points": 60480,
"points": [
{
"timestamp": 1706832000000,
"llm_completions": 42,
"llm_input_tokens": 15000,
"llm_output_tokens": 8000,
"http_requests": 150,
"ws_active": 3,
"tool_executions": 25,
"mcp_calls": 12,
"active_sessions": 2
}
]
}
Metrics Persistence
Metrics history is persisted to SQLite, so historical data survives server
restarts. The database is stored at ~/.moltis/metrics.db (or the configured
data directory).
Key features:
- 7-day retention: History is kept for 7 days (60,480 data points at 10-second intervals)
- Automatic cleanup: Old data is automatically removed hourly
- Startup recovery: History is loaded from the database when the server starts
The storage backend uses a trait-based design (MetricsStore), allowing
alternative implementations (e.g., TimescaleDB) for larger deployments.
Storage Architecture
#![allow(unused)] fn main() { // The MetricsStore trait defines the storage interface #[async_trait] pub trait MetricsStore: Send + Sync { async fn save_point(&self, point: &MetricsHistoryPoint) -> Result<()>; async fn load_history(&self, since: u64, limit: usize) -> Result<Vec<MetricsHistoryPoint>>; async fn cleanup_before(&self, before: u64) -> Result<u64>; async fn latest_point(&self) -> Result<Option<MetricsHistoryPoint>>; } }
The default SqliteMetricsStore implementation stores data in a single table
with an index on the timestamp column for efficient range queries.
Web UI Dashboard
The gateway includes a built-in metrics dashboard at /monitoring in the web UI.
This page displays:
Overview Tab:
- System metrics (uptime, connected clients, active sessions)
- LLM usage (completions, tokens, cache statistics)
- Tool execution statistics
- MCP server status
- Provider breakdown table
- Prometheus endpoint (with copy button)
Charts Tab:
- Token usage over time (input/output)
- HTTP requests and LLM completions
- WebSocket connections and active sessions
- Tool executions and MCP calls
The dashboard uses uPlot for lightweight, high-performance time-series charts. Data updates every 10 seconds for current metrics and every 30 seconds for history.
Available Metrics
HTTP Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_http_requests_total | Counter | method, status, endpoint | Total HTTP requests |
moltis_http_request_duration_seconds | Histogram | method, status, endpoint | Request latency |
moltis_http_requests_in_flight | Gauge | — | Currently processing requests |
LLM/Agent Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_llm_completions_total | Counter | provider, model | Total completions requested |
moltis_llm_completion_duration_seconds | Histogram | provider, model | Completion latency |
moltis_llm_input_tokens_total | Counter | provider, model | Input tokens processed |
moltis_llm_output_tokens_total | Counter | provider, model | Output tokens generated |
moltis_llm_completion_errors_total | Counter | provider, model, error_type | Completion failures |
moltis_llm_time_to_first_token_seconds | Histogram | provider, model | Streaming TTFT |
Provider Aliases
When you have multiple instances of the same provider type (e.g., separate API keys
for work and personal use), you can use the alias configuration option to
differentiate them in metrics:
[providers.anthropic]
api_key = "sk-work-..."
alias = "anthropic-work"
# Note: You would need separate config sections for multiple instances
# of the same provider. This is a placeholder for future functionality.
The alias appears in the provider label of all LLM metrics:
moltis_llm_input_tokens_total{provider="anthropic-work", model="claude-3-opus"} 5000
moltis_llm_input_tokens_total{provider="anthropic-personal", model="claude-3-opus"} 3000
This allows you to:
- Track token usage separately for billing purposes
- Create separate Grafana dashboards per provider instance
- Monitor rate limits and quotas independently
MCP (Model Context Protocol) Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_mcp_tool_calls_total | Counter | server, tool | Tool invocations |
moltis_mcp_tool_call_duration_seconds | Histogram | server, tool | Tool call latency |
moltis_mcp_tool_call_errors_total | Counter | server, tool, error_type | Tool call failures |
moltis_mcp_servers_connected | Gauge | — | Active MCP server connections |
Tool Execution Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_tool_executions_total | Counter | tool | Tool executions |
moltis_tool_execution_duration_seconds | Histogram | tool | Execution time |
moltis_sandbox_command_executions_total | Counter | — | Sandbox commands run |
Session Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_sessions_created_total | Counter | — | Sessions created |
moltis_sessions_active | Gauge | — | Currently active sessions |
moltis_session_messages_total | Counter | role | Messages by role |
Cron Job Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_cron_jobs_scheduled | Gauge | — | Number of scheduled jobs |
moltis_cron_executions_total | Counter | — | Job executions |
moltis_cron_execution_duration_seconds | Histogram | — | Job duration |
moltis_cron_errors_total | Counter | — | Failed jobs |
moltis_cron_stuck_jobs_cleared_total | Counter | — | Jobs exceeding 2h timeout |
moltis_cron_input_tokens_total | Counter | — | Input tokens from cron runs |
moltis_cron_output_tokens_total | Counter | — | Output tokens from cron runs |
Memory/Search Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_memory_searches_total | Counter | search_type | Searches performed |
moltis_memory_search_duration_seconds | Histogram | search_type | Search latency |
moltis_memory_embeddings_generated_total | Counter | provider | Embeddings created |
Channel Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_channels_active | Gauge | — | Loaded channel plugins |
moltis_channel_messages_received_total | Counter | channel | Inbound messages |
moltis_channel_messages_sent_total | Counter | channel | Outbound messages |
Telegram-Specific Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_telegram_messages_received_total | Counter | — | Messages from Telegram |
moltis_telegram_access_control_denials_total | Counter | — | Access denied events |
moltis_telegram_polling_duration_seconds | Histogram | — | Message handling time |
OAuth Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_oauth_flow_starts_total | Counter | — | OAuth flows initiated |
moltis_oauth_flow_completions_total | Counter | — | Successful completions |
moltis_oauth_token_refresh_total | Counter | — | Token refreshes |
moltis_oauth_token_refresh_failures_total | Counter | — | Refresh failures |
Skills Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
moltis_skills_installation_attempts_total | Counter | — | Installation attempts |
moltis_skills_installation_duration_seconds | Histogram | — | Installation time |
moltis_skills_git_clone_total | Counter | — | Successful git clones |
moltis_skills_git_clone_fallback_total | Counter | — | Fallbacks to HTTP tarball |
Tracing Integration
The moltis-metrics crate includes optional tracing integration via the
tracing feature. This allows span context to propagate to metric labels.
Enabling Tracing
moltis-metrics = { version = "0.1", features = ["prometheus", "tracing"] }
Initialization
use moltis_metrics::tracing_integration::init_tracing; fn main() { // Initialize tracing with metrics context propagation init_tracing(); // Now spans will add labels to metrics }
How It Works
When tracing is enabled, span fields are automatically added as metric labels:
#![allow(unused)] fn main() { use tracing::instrument; #[instrument(fields(operation = "fetch_user", component = "api"))] async fn fetch_user(id: u64) -> User { // Metrics recorded here will include: // - operation="fetch_user" // - component="api" counter!("api_calls_total").increment(1); } }
Span Labels
The following span fields are propagated to metrics:
| Field | Description |
|---|---|
operation | The operation being performed |
component | The component/module name |
span.name | The span’s target/name |
Adding Custom Metrics
In Your Code
Use the metrics macros re-exported from moltis-metrics:
#![allow(unused)] fn main() { use moltis_metrics::{counter, gauge, histogram, labels}; // Simple counter counter!("my_custom_requests_total").increment(1); // Counter with labels counter!( "my_custom_requests_total", labels::ENDPOINT => "/api/users", labels::METHOD => "GET" ).increment(1); // Gauge (current value) gauge!("my_queue_size").set(42.0); // Histogram (distribution) histogram!("my_operation_duration_seconds").record(0.123); }
Feature-Gating
Always gate metrics code to avoid overhead when disabled:
#![allow(unused)] fn main() { #[cfg(feature = "metrics")] use moltis_metrics::{counter, histogram}; pub async fn my_function() { #[cfg(feature = "metrics")] let start = std::time::Instant::now(); // ... do work ... #[cfg(feature = "metrics")] { counter!("my_operations_total").increment(1); histogram!("my_operation_duration_seconds") .record(start.elapsed().as_secs_f64()); } } }
Adding New Metric Definitions
For consistency, add metric name constants to crates/metrics/src/definitions.rs:
#![allow(unused)] fn main() { /// My feature metrics pub mod my_feature { /// Total operations performed pub const OPERATIONS_TOTAL: &str = "moltis_my_feature_operations_total"; /// Operation duration in seconds pub const OPERATION_DURATION_SECONDS: &str = "moltis_my_feature_operation_duration_seconds"; } }
Then use them:
#![allow(unused)] fn main() { use moltis_metrics::{counter, my_feature}; counter!(my_feature::OPERATIONS_TOTAL).increment(1); }
Configuration
Metrics configuration in moltis.toml:
[metrics]
enabled = true # Enable metrics collection (default: true)
prometheus_endpoint = true # Expose /metrics endpoint (default: true)
labels = { env = "prod" } # Add custom labels to all metrics
Environment variables:
RUST_LOG=moltis_metrics=debug— Enable debug logging for metrics initialization
Best Practices
- Use consistent naming: Follow the pattern
moltis_<subsystem>_<metric>_<unit> - Add units to names:
_totalfor counters,_secondsfor durations,_bytesfor sizes - Keep cardinality low: Avoid high-cardinality labels (like user IDs or request IDs)
- Feature-gate everything: Use
#[cfg(feature = "metrics")]to ensure zero overhead when disabled - Use predefined buckets: The
bucketsmodule has standard histogram buckets for common metric types
Troubleshooting
Metrics not appearing
- Verify the
metricsfeature is enabled at compile time - Check that the metrics recorder is initialized (happens automatically in gateway)
- Ensure you’re hitting the correct
/metricsendpoint - Check
moltis.tomlhas[metrics] enabled = true
Prometheus endpoint not available
- Ensure the
prometheusfeature is enabled (it’s separate frommetrics) - Check your build:
cargo build --features prometheus
High memory usage
- Check for high-cardinality labels (many unique label combinations)
- Consider reducing histogram bucket counts
Missing labels
- Ensure labels are passed consistently across all metric recordings
- Check that tracing spans include the expected fields
Tool Registry
The tool registry manages all tools available to the agent during a conversation. It tracks where each tool comes from and supports filtering by source.
Tool Sources
Every registered tool has a ToolSource that identifies its origin:
Builtin— tools shipped with the binary (exec, web_fetch, etc.)Mcp { server }— tools provided by an MCP server, tagged with the server name
This replaces the previous convention of identifying MCP tools by their
mcp__ name prefix, providing type-safe filtering instead of string matching.
Registration
#![allow(unused)] fn main() { // Built-in tool registry.register(Box::new(MyTool::new())); // MCP tool — tagged with server name registry.register_mcp(Box::new(adapter), "github".to_string()); }
Filtering
When MCP tools are disabled for a session, the registry can produce a filtered copy:
#![allow(unused)] fn main() { // Type-safe: filters by ToolSource::Mcp variant let no_mcp = registry.clone_without_mcp(); // Remove all MCP tools in-place (used during sync) let removed_count = registry.unregister_mcp(); }
Schema Output
list_schemas() includes source metadata in every tool schema:
{
"name": "exec",
"description": "Execute a command",
"parameters": { ... },
"source": "builtin"
}
{
"name": "mcp__github__search",
"description": "Search GitHub",
"parameters": { ... },
"source": "mcp",
"mcpServer": "github"
}
The source and mcpServer fields are available to the UI for rendering
tools grouped by origin.
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
[0.1.4] - 2026-02-06
Added
-
Config Check Command:
moltis config checkvalidates the configuration file, detects unknown/misspelled fields with Levenshtein-based suggestions, warns about security misconfigurations, and checks file references -
Memory Usage Indicator: Display process RSS and system free memory in the header bar, updated every 30 seconds via the tick WebSocket broadcast
-
QMD Backend Support: Optional QMD (Query Memory Daemon) backend for hybrid search with BM25 + vector + LLM reranking
- Gated behind
qmdfeature flag (enabled by default) - Web UI shows installation instructions and QMD status
- Comparison table between built-in SQLite and QMD backends
- Gated behind
-
Citations: Configurable citation mode (on/off/auto) for memory search results
- Auto mode includes citations when results span multiple files
-
Session Export: Option to export session transcripts to memory for future reference
-
LLM Reranking: Use LLM to rerank search results for improved relevance (requires QMD)
-
Memory Documentation: Added
docs/src/memory.mdwith comprehensive memory system documentation -
Mobile PWA Support: Install moltis as a Progressive Web App on iOS, Android, and desktop
- Standalone mode with full-screen experience
- Custom app icon (crab mascot)
- Service worker for offline support and caching
- Safe area support for notched devices
-
Push Notifications: Receive alerts when the LLM responds
- VAPID key generation and storage for Web Push API
- Subscribe/unsubscribe toggle in Settings > Notifications
- Subscription management UI showing device name, IP address, and date
- Remove any subscription from any device
- Real-time subscription updates via WebSocket
- Client IP detection from X-Forwarded-For, X-Real-IP, CF-Connecting-IP headers
- Notifications sent for both streaming and agent (tool-using) chat modes
-
Safari/iOS PWA Detection: Show “Add to Dock” instructions when push notifications require PWA installation (Safari doesn’t support push in browser mode)
-
Session state store: per-session key-value persistence scoped by namespace, backed by SQLite (
session_statetool). -
Session branching:
branch_sessiontool forks a conversation at any message index into an independent copy. -
Session fork from UI: Fork button in the chat header and sidebar action buttons let users fork sessions without asking the LLM. Forked sessions appear indented under their parent with a branch icon.
-
Skill self-extension:
create_skill,update_skill,delete_skilltools let the agent manage project-local skills at runtime. -
Skill hot-reload: filesystem watcher on skill directories emits
skills.changedevents via WebSocket when SKILL.md files change. -
Typed tool sources:
ToolSourceenum (Builtin/Mcp { server }) replaces string-prefix identification of MCP tools in the tool registry. -
Tool registry metadata:
list_schemas()now includessourceandmcpServerfields so the UI can group tools by origin. -
Per-session MCP toggle: sessions store an
mcp_disabledflag; the chat header exposes a toggle button to enable/disable MCP tools per session. -
Debug panel convergence: the debug side-panel now renders the same seven sections as the
/contextslash command, eliminating duplicated rendering logic. -
Documentation pages for session state, session branching, skill self-extension, and the tool registry architecture.
Changed
-
Memory settings UI enhanced with backend comparison and feature explanations
-
Added
memory.qmd.statusRPC method for checking QMD availability -
Extended
memory.config.getto includeqmd_feature_enabledflag -
Push notifications feature is now enabled by default in the CLI
-
TLS HTTP redirect port now defaults to
gateway_port + 1instead of the hardcoded port18790. This makes the Dockerfile simpler (both ports are adjacent) and avoids collisions when running multiple instances. Override via[tls] http_redirect_portinmoltis.tomlor theMOLTIS_TLS__HTTP_REDIRECT_PORTenvironment variable. -
TLS certificates use
moltis.localhostdomain. Auto-generated server certs now includemoltis.localhost,*.moltis.localhost,localhost,127.0.0.1, and::1as SANs. Banner and redirect URLs usehttps://moltis.localhost:<port>when bound to loopback, so the cert matches the displayed URL. Existing certs are automatically regenerated on next startup. -
Certificate validity uses dynamic dates. Cert
notBefore/notAfterare now computed from the current system time instead of being hardcoded. CA certs are valid for 10 years, server certs for 1 year from generation. -
McpToolBridgenow stores and exposesserver_name()for typed registration. -
mcp_service::sync_mcp_tools()usesunregister_mcp()/register_mcp()instead of scanning tool names by prefix. -
chat.rsusesclone_without_mcp()instead ofclone_without_prefix("mcp__")in all three call sites.
Fixed
- Push notifications not sending when chat uses agent mode (run_with_tools)
- Missing space in Safari install instructions (“usingFile” → “using File”)
- WebSocket origin validation now treats
.localhostsubdomains (e.g.moltis.localhost) as loopback equivalents per RFC 6761. - Fork/branch icon in session sidebar now renders cleanly at 16px (replaced complex git-branch SVG with simple trunk+branch path).
- Deleting a forked session now navigates to the parent session instead of an unrelated sibling.
- Streaming tool calls for non-Anthropic providers:
OpenAiProvider,GitHubCopilotProvider,KimiCodeProvider,OpenAiCodexProvider, andProviderChainnow implementstream_with_tools()so tool schemas are sent in the streaming API request and tool-call events are properly parsed. Previously onlyAnthropicProvidersupported streaming tool calls; all other providers silently dropped the tools parameter, causing the LLM to emit tool invocations as plain text instead of structured function calls. - Streaming tool call arguments dropped when index ≠ 0: When a provider
(e.g. GitHub Copilot proxying Claude) emits a text content block at
streaming index 0 and a tool_use block at index 1, the runner’s argument
finalization used the streaming index as the vector position directly.
Since
tool_callshas only 1 element at position 0, the condition1 < 1was false and arguments were silently dropped (empty{}). Fixed by mapping streaming indices to vector positions via a HashMap. - Skill tools wrote to wrong directory:
create_skill,update_skill, anddelete_skillusedstd::env::current_dir()captured at gateway startup, writing skills to<cwd>/.moltis/skills/instead of~/.moltis/skills/. Skills now write to<data_dir>/skills/(Personal source), which is always discovered regardless of where the gateway was started. - Skills page missing personal/project skills: The
/api/skillsendpoint only returned manifest-based registry skills. Personal and project-local skills were never shown in the navigation or skills page. The endpoint now discovers and includes them alongside registry skills.
Documentation
- Added mobile-pwa.md with PWA installation and push notification documentation
- Updated CLAUDE.md with cargo feature policy (features enabled by default)
- Rewrote session-branching.md with accurate fork details, UI methods, RPC API, inheritance table, and deletion behavior.