Moltis

A personal AI gateway written in Rust.
One binary, no runtime, no npm.

Moltis compiles your entire AI gateway — web UI, LLM providers, tools, and all assets — into a single self-contained executable. There’s no Node.js to babysit, no node_modules to sync, no V8 garbage collector introducing latency spikes.

# Quick install (macOS / Linux)
curl -fsSL https://www.moltis.org/install.sh | sh

Why Moltis?

Feature	Moltis	Other Solutions
Deployment	Single binary	Node.js + dependencies
Memory Safety	Rust ownership	Garbage collection
Secret Handling	Zeroed on drop	“Eventually collected”
Sandbox	Docker + Apple Container	Docker only
Startup	Milliseconds	Seconds

Key Features

30+ LLM Providers — Anthropic, OpenAI, Google, Mistral, local models, and more
Streaming-First — Responses appear as tokens arrive, not after completion
Sandboxed Execution — Commands run in isolated containers (Docker or Apple Container)
MCP Support — Connect to Model Context Protocol servers for extended capabilities
Multi-Channel — Web UI, Telegram, API access with synchronized responses
Long-Term Memory — Embeddings-powered knowledge base with hybrid search
Hook System — Observe, modify, or block actions at any lifecycle point
Compile-Time Safety — Misconfigurations caught by cargo check, not runtime crashes

Quick Start

# Install
curl -fsSL https://www.moltis.org/install.sh | sh

# Run
moltis

On first launch:

Open the URL shown in your browser (e.g., http://localhost:13131)
Add your LLM API key
Start chatting!

Note

Authentication is only required when accessing Moltis from a non-localhost address. On localhost, you can start using it immediately.

→ Full Quickstart Guide

How It Works

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Web UI    │  │  Telegram   │  │     API     │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │
       └────────────────┴────────────────┘
                        │
                        ▼
        ┌───────────────────────────────┐
        │       Moltis Gateway          │
        │   ┌─────────┐ ┌───────────┐   │
        │   │  Agent  │ │   Tools   │   │
        │   │  Loop   │◄┤  Registry │   │
        │   └────┬────┘ └───────────┘   │
        │        │                      │
        │   ┌────▼────────────────┐     │
        │   │  Provider Registry  │     │
        │   │ Claude · GPT · Gemini │   │
        │   └─────────────────────┘     │
        └───────────────────────────────┘
                        │
                ┌───────▼───────┐
                │    Sandbox    │
                │ Docker/Apple  │
                └───────────────┘

Documentation

Getting Started

Quickstart — Up and running in 5 minutes
Installation — All installation methods
Configuration — moltis.toml reference

Features

Providers — Configure LLM providers
MCP Servers — Extend with Model Context Protocol
Hooks — Lifecycle hooks for customization
Local LLMs — Run models on your machine

Deployment

Docker — Container deployment

Architecture

Streaming — How real-time streaming works
Metrics & Tracing — Observability

Security

Moltis applies defense in depth:

Authentication — Password or passkey (WebAuthn) required for non-localhost access
SSRF Protection — Blocks requests to internal networks
Secret Handling — secrecy::Secret zeroes memory on drop
Sandboxed Execution — Commands never run on the host
Origin Validation — Prevents Cross-Site WebSocket Hijacking
No Unsafe Code — unsafe is denied workspace-wide

Community

GitHub: github.com/moltis-org/moltis
Issues: Report bugs
Discussions: Ask questions

License

MIT — Free for personal and commercial use.

Quickstart

Get Moltis running in under 5 minutes.

1. Install

curl -fsSL https://www.moltis.org/install.sh | sh

Or via Homebrew:

brew install moltis-org/tap/moltis

2. Start

moltis

You’ll see output like:

🚀 Moltis gateway starting...
🌐 Open http://localhost:13131 in your browser

3. Configure a Provider

You need an LLM API key to chat. The easiest options:

Option A: Anthropic (Recommended)

Get an API key from console.anthropic.com
In Moltis, go to Settings → Providers
Click Anthropic → Enter your API key → Save

Option B: OpenAI

Get an API key from platform.openai.com
In Moltis, go to Settings → Providers
Click OpenAI → Enter your API key → Save

Option C: Local Model (Free)

Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
Pull a model: ollama pull llama3.2
In Moltis, configure Ollama in Settings → Providers

4. Chat!

Go to the Chat tab and start a conversation:

You: Write a Python function to check if a number is prime

Agent: Here's a Python function to check if a number is prime:

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True

What’s Next?

Enable Tool Use

Moltis can execute code, browse the web, and more. Tools are enabled by default with sandbox protection.

Try:

You: Create a hello.py file that prints "Hello, World!" and run it

Connect Telegram

Chat with your agent from anywhere:

Create a bot via @BotFather
Copy the bot token
In Moltis: Settings → Telegram → Enter token → Save
Message your bot!

Add MCP Servers

Extend capabilities with MCP servers:

# In moltis.toml
[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }

Set Up Memory

Enable long-term memory for context across sessions:

# In moltis.toml
[memory]
enabled = true

Add knowledge by placing Markdown files in ~/.moltis/memory/.

Useful Commands

Command	Description
`/new`	Start a new session
`/model <name>`	Switch models
`/clear`	Clear chat history
`/help`	Show available commands

File Locations

Path	Contents
`~/.config/moltis/moltis.toml`	Configuration
`~/.config/moltis/provider_keys.json`	API keys
`~/.moltis/`	Data (sessions, memory, logs)

Getting Help

Documentation: docs.moltis.org
GitHub Issues: github.com/moltis-org/moltis/issues
Discussions: github.com/moltis-org/moltis/discussions

Installation

Moltis is distributed as a single self-contained binary. Choose the installation method that works best for your setup.

Quick Install (Recommended)

The fastest way to get started on macOS or Linux:

curl -fsSL https://www.moltis.org/install.sh | sh

This downloads the latest release for your platform and installs it to ~/.local/bin.

Package Managers

Homebrew (macOS / Linux)

brew install moltis-org/tap/moltis

Cargo Binstall (Pre-built Binary)

If you have cargo-binstall installed:

cargo binstall moltis

This downloads a pre-built binary without compiling from source.

Linux Packages

Debian / Ubuntu (.deb)

# Download the latest .deb package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis_amd64.deb

# Install
sudo dpkg -i moltis_amd64.deb

Fedora / RHEL (.rpm)

# Download the latest .rpm package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.x86_64.rpm

# Install
sudo rpm -i moltis.x86_64.rpm

Arch Linux (.pkg.tar.zst)

# Download the latest package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.pkg.tar.zst

# Install
sudo pacman -U moltis.pkg.tar.zst

Snap

sudo snap install moltis

AppImage

# Download
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.AppImage
chmod +x moltis.AppImage

# Run
./moltis.AppImage

Docker

Multi-architecture images (amd64/arm64) are published to GitHub Container Registry:

docker pull ghcr.io/moltis-org/moltis:latest

See Docker Deployment for full instructions on running Moltis in a container.

Build from Source

Prerequisites

Rust 1.75 or later
A C compiler (for some dependencies)

Clone and Build

git clone https://github.com/moltis-org/moltis.git
cd moltis
cargo build --release

The binary will be at target/release/moltis.

Install via Cargo

cargo install moltis --git https://github.com/moltis-org/moltis

First Run

After installation, start Moltis:

moltis

On first launch:

Open http://localhost:<port> in your browser (the port is shown in the terminal output)
Configure your LLM provider (API key)
Start chatting!

Tip

Moltis picks a random available port on first install to avoid conflicts. The port is saved in your config and reused on subsequent runs.

Note

Authentication is only required when accessing Moltis from a non-localhost address (e.g., over the network). When this happens, a one-time setup code is printed to the terminal for initial authentication setup.

Verify Installation

moltis --version

Updating

Homebrew

brew upgrade moltis

Cargo Binstall

cargo binstall moltis

From Source

cd moltis
git pull
cargo build --release

Uninstalling

Homebrew

brew uninstall moltis

Remove Data

Moltis stores data in two directories:

# Configuration
rm -rf ~/.config/moltis

# Data (sessions, databases, memory)
rm -rf ~/.moltis

Warning

Removing these directories deletes all your conversations, memory, and settings permanently.

Configuration

Moltis is configured through moltis.toml, located in ~/.config/moltis/ by default.

On first run, a complete configuration file is generated with sensible defaults. You can edit it to customize behavior.

Configuration File Location

Platform	Default Path
macOS/Linux	`~/.config/moltis/moltis.toml`
Custom	Set via `--config-dir` or `MOLTIS_CONFIG_DIR`

Basic Settings

[gateway]
port = 13131                    # HTTP/WebSocket port
host = "0.0.0.0"               # Listen address

[agent]
name = "Moltis"                 # Agent display name
model = "claude-sonnet-4-20250514"  # Default model
timeout = 600                   # Agent run timeout (seconds)
max_iterations = 25             # Max tool call iterations per run

LLM Providers

Provider API keys are stored separately in ~/.config/moltis/provider_keys.json for security. Configure them through the web UI or directly in the JSON file.

[providers]
default = "anthropic"           # Default provider

[providers.anthropic]
enabled = true
models = [
    "claude-sonnet-4-20250514",
    "claude-opus-4-20250514",
    "claude-3-5-haiku-20241022",
]

[providers.openai]
enabled = true
models = [
    "gpt-4o",
    "gpt-4o-mini",
    "o1-preview",
]

See Providers for detailed provider configuration.

Sandbox Configuration

Commands run inside isolated containers for security:

[tools.exec.sandbox]
enabled = true
backend = "docker"              # "docker" or "apple" (macOS 15+)
base_image = "ubuntu:25.10"

# Packages installed in the sandbox image
packages = [
    "curl",
    "git",
    "jq",
    "python3",
    "python3-pip",
    "nodejs",
    "npm",
]

Info

When you modify the packages list and restart, Moltis automatically rebuilds the sandbox image with a new tag.

Memory System

Long-term memory uses embeddings for semantic search:

[memory]
enabled = true
embedding_model = "text-embedding-3-small"  # OpenAI embedding model
chunk_size = 512                # Characters per chunk
chunk_overlap = 50              # Overlap between chunks

# Directories to watch for memory files
watch_dirs = [
    "~/.moltis/memory",
]

Authentication

Authentication is only required when accessing Moltis from a non-localhost address. When running on localhost or 127.0.0.1, no authentication is needed by default.

When you access Moltis from a network address (e.g., http://192.168.1.100:13131), a one-time setup code is printed to the terminal. Use it to set up a password or passkey.

[auth]
disabled = false                # Set true to disable auth entirely

# Session settings
session_expiry = 604800         # Session lifetime in seconds (7 days)

Warning

Only set disabled = true if Moltis is running on a trusted private network. Never expose an unauthenticated instance to the internet.

Hooks

Configure lifecycle hooks:

[[hooks]]
name = "my-hook"
command = "./hooks/my-hook.sh"
events = ["BeforeToolCall", "AfterToolCall"]
timeout = 5                     # Timeout in seconds

[hooks.env]
MY_VAR = "value"               # Environment variables for the hook

See Hooks for the full hook system documentation.

MCP Servers

Connect to Model Context Protocol servers:

[[mcp.servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed"]

[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }

Telegram Integration

[telegram]
enabled = true
# Token is stored in provider_keys.json, not here
allowed_users = [123456789]     # Telegram user IDs allowed to chat

TLS / HTTPS

[tls]
enabled = true
cert_path = "~/.config/moltis/cert.pem"
key_path = "~/.config/moltis/key.pem"
# If paths don't exist, a self-signed certificate is generated

# Port for the plain-HTTP redirect / CA-download server.
# Defaults to the gateway port + 1 when not set.
# http_redirect_port = 13132

Override via environment variable: MOLTIS_TLS__HTTP_REDIRECT_PORT=8080.

Tailscale Integration

Expose Moltis over your Tailscale network:

[tailscale]
enabled = true
mode = "serve"                  # "serve" (private) or "funnel" (public)

Observability

[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317"  # OpenTelemetry collector

Environment Variables

All settings can be overridden via environment variables:

Variable	Description
`MOLTIS_CONFIG_DIR`	Configuration directory
`MOLTIS_DATA_DIR`	Data directory
`MOLTIS_PORT`	Gateway port
`MOLTIS_HOST`	Listen address

CLI Flags

moltis --config-dir /path/to/config --data-dir /path/to/data

Complete Example

[gateway]
port = 13131
host = "0.0.0.0"

[agent]
name = "Atlas"
model = "claude-sonnet-4-20250514"
timeout = 600
max_iterations = 25

[providers]
default = "anthropic"

[tools.exec.sandbox]
enabled = true
backend = "docker"
base_image = "ubuntu:25.10"
packages = ["curl", "git", "jq", "python3", "nodejs"]

[memory]
enabled = true

[auth]
disabled = false

[[hooks]]
name = "audit-log"
command = "./hooks/audit.sh"
events = ["BeforeToolCall"]
timeout = 5

LLM Providers

Moltis supports 30+ LLM providers through a trait-based architecture. Configure providers through the web UI or directly in configuration files.

Supported Providers

Tier 1 (Full Support)

Provider	Models	Tool Calling	Streaming
Anthropic	Claude 4, Claude 3.5, Claude 3	✅	✅
OpenAI	GPT-4o, GPT-4, o1, o3	✅	✅
Google	Gemini 2.0, Gemini 1.5	✅	✅
GitHub Copilot	GPT-4o, Claude	✅	✅

Tier 2 (Good Support)

Provider	Models	Tool Calling	Streaming
Mistral	Mistral Large, Codestral	✅	✅
Groq	Llama 3, Mixtral	✅	✅
Together	Various open models	✅	✅
Fireworks	Various open models	✅	✅
DeepSeek	DeepSeek V3, Coder	✅	✅

Tier 3 (Basic Support)

Provider	Notes
OpenRouter	Aggregator for 100+ models
Ollama	Local models
Venice	Privacy-focused
Cerebras	Fast inference
SambaNova	Enterprise
Cohere	Command models
AI21	Jamba models

Configuration

Via Web UI (Recommended)

Open Moltis in your browser
Go to Settings → Providers
Click on a provider card
Enter your API key
Select your preferred model

Via Configuration Files

Provider credentials are stored in ~/.config/moltis/provider_keys.json:

{
  "anthropic": {
    "apiKey": "sk-ant-...",
    "model": "claude-sonnet-4-20250514"
  },
  "openai": {
    "apiKey": "sk-...",
    "model": "gpt-4o"
  }
}

Enable providers in moltis.toml:

[providers]
default = "anthropic"

[providers.anthropic]
enabled = true
models = [
    "claude-sonnet-4-20250514",
    "claude-opus-4-20250514",
]

[providers.openai]
enabled = true

Provider-Specific Setup

Anthropic

Get an API key from console.anthropic.com
Enter it in Settings → Providers → Anthropic

Tip

Claude Sonnet 4 offers the best balance of capability and cost for most coding tasks.

OpenAI

Get an API key from platform.openai.com
Enter it in Settings → Providers → OpenAI

GitHub Copilot

GitHub Copilot uses OAuth authentication:

Click Connect in Settings → Providers → GitHub Copilot
Complete the GitHub OAuth flow
Authorize Moltis to access Copilot

Info

Requires an active GitHub Copilot subscription.

Google (Gemini)

Get an API key from aistudio.google.com
Enter it in Settings → Providers → Google

Ollama (Local Models)

Run models locally with Ollama:

Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
Pull a model: ollama pull llama3.2
Configure in Moltis:

{
  "ollama": {
    "baseUrl": "http://localhost:11434",
    "model": "llama3.2"
  }
}

OpenRouter

Access 100+ models through one API:

Get an API key from openrouter.ai
Enter it in Settings → Providers → OpenRouter
Specify the model ID you want to use

{
  "openrouter": {
    "apiKey": "sk-or-...",
    "model": "anthropic/claude-3.5-sonnet"
  }
}

Custom Base URLs

For providers with custom endpoints (enterprise, proxies):

{
  "openai": {
    "apiKey": "sk-...",
    "baseUrl": "https://your-proxy.example.com/v1",
    "model": "gpt-4o"
  }
}

Switching Providers

Per-Session

In the chat interface, use the model selector dropdown to switch providers/models for the current session.

Per-Message

Use the /model command to switch models mid-conversation:

/model claude-opus-4-20250514

Default Provider

Set the default in moltis.toml:

[providers]
default = "anthropic"

[agent]
model = "claude-sonnet-4-20250514"

Model Capabilities

Different models have different strengths:

Use Case	Recommended Model
General coding	Claude Sonnet 4, GPT-4o
Complex reasoning	Claude Opus 4, o1
Fast responses	Claude Haiku, GPT-4o-mini
Long context	Claude (200k), Gemini (1M+)
Local/private	Llama 3 via Ollama

Troubleshooting

“Model not available”

The model may not be enabled for your account or region. Check:

Your API key has access to the model
The model ID is spelled correctly
Your account has sufficient credits

“Rate limited”

You’ve exceeded the provider’s rate limits. Solutions:

Wait and retry
Use a different provider
Upgrade your API plan

“Invalid API key”

Verify the key is correct (no extra spaces)
Check the key hasn’t expired
Ensure the key has the required permissions

MCP Servers

Moltis supports the Model Context Protocol (MCP) for connecting to external tool servers. MCP servers extend your agent’s capabilities without modifying Moltis itself.

What is MCP?

MCP is an open protocol that lets AI assistants connect to external tools and data sources. Think of MCP servers as plugins that provide:

Tools — Functions the agent can call (e.g., search, file operations, API calls)
Resources — Data the agent can read (e.g., files, database records)
Prompts — Pre-defined prompt templates

Supported Transports

Transport	Description	Use Case
stdio	Local process via stdin/stdout	npm packages, local scripts
HTTP/SSE	Remote server via HTTP	Cloud services, shared servers

Adding an MCP Server

Via Web UI

Go to Settings → MCP Servers
Click Add Server
Enter the server configuration
Click Save

Via Configuration

Add servers to moltis.toml:

[[mcp.servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/projects"]

[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }

[[mcp.servers]]
name = "remote-api"
url = "https://mcp.example.com/sse"
transport = "sse"

Popular MCP Servers

Official Servers

Server	Description	Install
filesystem	Read/write local files	`npx @modelcontextprotocol/server-filesystem`
github	GitHub API access	`npx @modelcontextprotocol/server-github`
postgres	PostgreSQL queries	`npx @modelcontextprotocol/server-postgres`
sqlite	SQLite database	`npx @modelcontextprotocol/server-sqlite`
puppeteer	Browser automation	`npx @modelcontextprotocol/server-puppeteer`
brave-search	Web search	`npx @modelcontextprotocol/server-brave-search`

Community Servers

Explore more at mcp.so and GitHub MCP Servers.

Configuration Options

[[mcp.servers]]
name = "my-server"              # Display name
command = "node"                # Command to run
args = ["server.js"]            # Command arguments
cwd = "/path/to/server"         # Working directory

# Environment variables
env = { API_KEY = "secret", DEBUG = "true" }

# Health check settings
health_check_interval = 30      # Seconds between health checks
restart_on_failure = true       # Auto-restart on crash
max_restart_attempts = 5        # Give up after N restarts
restart_backoff = "exponential" # "linear" or "exponential"

Server Lifecycle

┌─────────────────────────────────────────────────────┐
│                   MCP Server                         │
│                                                      │
│  Start → Initialize → Ready → [Tool Calls] → Stop   │
│            │                       │                 │
│            ▼                       ▼                 │
│     Health Check ◄─────────── Heartbeat             │
│            │                       │                 │
│            ▼                       ▼                 │
│    Crash Detected ───────────► Restart              │
│                                    │                 │
│                              Backoff Wait            │
└─────────────────────────────────────────────────────┘

Health Monitoring

Moltis monitors MCP servers and automatically:

Detects crashes via process exit
Restarts with exponential backoff
Disables after max restart attempts
Re-enables after cooldown period

Using MCP Tools

Once connected, MCP tools appear alongside built-in tools. The agent can use them naturally:

User: Search GitHub for Rust async runtime projects

Agent: I'll search GitHub for you.
[Calling github.search_repositories with query="rust async runtime"]

Found 15 repositories:
1. tokio-rs/tokio - A runtime for writing reliable async applications
2. async-std/async-std - Async version of the Rust standard library
...

Creating an MCP Server

Simple Node.js Server

// server.js
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server(
  { name: "my-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler("tools/list", async () => ({
  tools: [{
    name: "hello",
    description: "Says hello",
    inputSchema: {
      type: "object",
      properties: {
        name: { type: "string", description: "Name to greet" }
      },
      required: ["name"]
    }
  }]
}));

server.setRequestHandler("tools/call", async (request) => {
  if (request.params.name === "hello") {
    const name = request.params.arguments.name;
    return { content: [{ type: "text", text: `Hello, ${name}!` }] };
  }
});

const transport = new StdioServerTransport();
await server.connect(transport);

Configure in Moltis

[[mcp.servers]]
name = "my-server"
command = "node"
args = ["server.js"]
cwd = "/path/to/my-server"

Debugging

Check Server Status

In the web UI, go to Settings → MCP Servers to see:

Connection status (connected/disconnected/error)
Available tools
Recent errors

View Logs

MCP server stderr is captured in Moltis logs:

# View gateway logs
tail -f ~/.moltis/logs/gateway.log | grep mcp

Test Locally

Run the server directly to debug:

echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | node server.js

Security Considerations

Warning

MCP servers run with the same permissions as Moltis. Only use servers from trusted sources.

Review server code before running
Limit file access — use specific paths, not /
Use environment variables for secrets
Network isolation — run untrusted servers in containers

Troubleshooting

Server won’t start

Check the command exists: which npx
Verify the package: npx @modelcontextprotocol/server-filesystem --help
Check for port conflicts

Tools not appearing

Server may still be initializing (wait a few seconds)
Check server logs for errors
Verify the server implements tools/list

Server keeps restarting

Check stderr for crash messages
Increase max_restart_attempts for debugging
Verify environment variables are set correctly

Memory System

Moltis provides a powerful memory system that enables the agent to recall past conversations, notes, and context across sessions. This document explains the available backends, features, and configuration options.

Backends

Moltis supports two memory backends:

Feature	Built-in	QMD
Search Type	Hybrid (vector + FTS5 keyword)	Hybrid (BM25 + vector + LLM reranking)
Local Embeddings	GGUF models via llama-cpp-2	GGUF models
Remote Embeddings	OpenAI, Ollama, custom endpoints	Built-in
Embedding Cache	SQLite with LRU eviction	Built-in
Batch API	OpenAI batch (50% cost saving)	No
Circuit Breaker	Fallback chain with auto-recovery	No
LLM Reranking	Optional (configurable)	Built-in with `query` command
File Watching	Real-time sync via notify	Built-in
External Dependency	None (pure Rust)	Requires QMD binary (Node.js/Bun)
Offline Support	Yes (with local embeddings)	Yes

Built-in Backend

The default backend uses SQLite for storage with FTS5 for keyword search and optional vector embeddings for semantic search. Key advantages:

Zero external dependencies: Everything is embedded in the moltis binary
Fallback chain: Automatically switches between embedding providers if one fails
Batch embedding: Reduces OpenAI API costs by 50% for large sync operations
Embedding cache: Avoids re-embedding unchanged content

QMD Backend

QMD is an optional external sidecar that provides enhanced search capabilities:

BM25 keyword search: Fast, instant results (similar to Elasticsearch)
Vector search: Semantic similarity using local GGUF models
Hybrid search with LLM reranking: Combines both methods with an LLM pass for optimal relevance

To use QMD:

Install QMD separately from github.com/qmd/qmd
Enable it in Settings > Memory > Backend

Features

Citations

Citations append source file and line number information to search results:

Some important content from your notes.

Source: memory/notes.md#42

Configuration options:

auto (default): Include citations when results come from multiple files
on: Always include citations
off: Never include citations

Session Export

When enabled, session transcripts are automatically exported to the memory system for cross-run recall. This allows the agent to remember past conversations even after restarts.

Exported sessions are:

Stored in memory/sessions/ as markdown files
Sanitized to remove sensitive tool results and system messages
Automatically cleaned up based on age/count limits

LLM Reranking

LLM reranking uses the configured language model to re-score and reorder search results based on semantic relevance to the query. This provides better results than keyword or vector matching alone, at the cost of additional latency.

How it works:

Initial search returns candidate results
LLM evaluates each result’s relevance (0.0-1.0 score)
Results are reordered by combined score (70% LLM, 30% original)

Configuration

Memory settings can be configured in moltis.toml:

[memory]
# Backend: "builtin" (default) or "qmd"
backend = "builtin"

# Embedding provider: "local", "ollama", "openai", "custom", or auto-detect
provider = "local"

# Citation mode: "on", "off", or "auto"
citations = "auto"

# Enable LLM reranking for hybrid search
llm_reranking = false

# Export sessions to memory for cross-run recall
session_export = true

# QMD-specific settings (only used when backend = "qmd")
[memory.qmd]
command = "qmd"
max_results = 10
timeout_ms = 30000

Or via the web UI: Settings > Memory

Embedding Providers

The built-in backend supports multiple embedding providers:

Provider	Model	Dimensions	Notes
Local (GGUF)	EmbeddingGemma-300M	768	Offline, ~300MB download
Ollama	nomic-embed-text	768	Requires Ollama running
OpenAI	text-embedding-3-small	1536	Requires API key
Custom	Configurable	Varies	OpenAI-compatible endpoint

The system auto-detects available providers and creates a fallback chain:

Try configured provider first
Fall back to other available providers if it fails
Use keyword-only search if no embedding provider is available

Memory Directories

By default, moltis indexes markdown files from:

~/.moltis/MEMORY.md - Main long-term memory file
~/.moltis/memory/*.md - Additional memory files
~/.moltis/memory/sessions/*.md - Exported session transcripts

Tools

The memory system exposes two agent tools:

memory_search

Search memory with a natural language query.

{
  "query": "what did we discuss about the API design?",
  "limit": 5
}

memory_get

Retrieve a specific chunk by ID.

{
  "chunk_id": "memory/notes.md:42"
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Memory Manager                          │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Chunker   │  │   Search    │  │  Session Export     │  │
│  │ (markdown)  │  │  (hybrid)   │  │  (transcripts)      │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│                    Storage Backend                          │
│  ┌────────────────────────┐  ┌────────────────────────┐    │
│  │   Built-in (SQLite)    │  │   QMD (sidecar)        │    │
│  │  - FTS5 keyword        │  │  - BM25 keyword        │    │
│  │  - Vector similarity   │  │  - Vector similarity   │    │
│  │  - Embedding cache     │  │  - LLM reranking       │    │
│  └────────────────────────┘  └────────────────────────┘    │
├─────────────────────────────────────────────────────────────┤
│                  Embedding Providers                        │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌───────────────┐  │
│  │  Local  │  │ Ollama  │  │ OpenAI  │  │ Batch/Fallback│  │
│  │  (GGUF) │  │         │  │         │  │               │  │
│  └─────────┘  └─────────┘  └─────────┘  └───────────────┘  │
└─────────────────────────────────────────────────────────────┘

Troubleshooting

Memory not working

Check status in Settings > Memory
Ensure at least one embedding provider is available:
- Local: Requires local-embeddings feature enabled at build
- Ollama: Must be running at localhost:11434
- OpenAI: Requires OPENAI_API_KEY environment variable

Search returns no results

Check that memory files exist in the expected directories
Trigger a manual sync by restarting moltis
Check logs for sync errors

QMD not available

Verify QMD is installed: qmd --version
Check that the path is correct in settings
Ensure QMD has indexed your collections: qmd stats

Hooks

Hooks let you observe, modify, or block actions at key points in the agent lifecycle. Use them for auditing, policy enforcement, notifications, and custom integrations.

How Hooks Work

┌─────────────────────────────────────────────────────────┐
│                      Agent Loop                         │
│                                                         │
│  User Message → BeforeToolCall → Tool Execution         │
│                       │                 │               │
│                       ▼                 ▼               │
│                 [Your Hook]      AfterToolCall          │
│                       │                 │               │
│                 modify/block      [Your Hook]           │
│                       │                 │               │
│                       ▼                 ▼               │
│                   Continue → Response → MessageSent     │
└─────────────────────────────────────────────────────────┘

Event Types

Modifying Events (Sequential)

These events run hooks sequentially. Hooks can modify the payload or block the action.

Event	Description	Can Modify	Can Block
`BeforeToolCall`	Before a tool executes	✅	✅
`BeforeCompaction`	Before context compaction	✅	✅
`MessageSending`	Before sending a response	✅	✅
`BeforeAgentStart`	Before agent loop starts	✅	✅

Read-Only Events (Parallel)

These events run hooks in parallel for performance. They cannot modify or block.

Event	Description
`AfterToolCall`	After a tool completes
`AfterCompaction`	After context is compacted
`MessageReceived`	When a user message arrives
`MessageSent`	After response is delivered
`AgentEnd`	When agent loop completes
`SessionStart`	When a new session begins
`SessionEnd`	When a session ends
`ToolResultPersist`	When tool result is saved
`GatewayStart`	When Moltis starts
`GatewayStop`	When Moltis shuts down
`Command`	When a slash command is used

Creating a Hook

1. Create the Hook Directory

mkdir -p ~/.moltis/hooks/my-hook

2. Create HOOK.md

+++
name = "my-hook"
description = "Logs all tool calls to a file"
events = ["BeforeToolCall", "AfterToolCall"]
command = "./handler.sh"
timeout = 5

[requires]
os = ["darwin", "linux"]
bins = ["jq"]
env = ["LOG_FILE"]
+++

# My Hook

This hook logs all tool calls for auditing purposes.

3. Create the Handler Script

#!/bin/bash
# handler.sh

# Read event payload from stdin
payload=$(cat)

# Extract event type
event=$(echo "$payload" | jq -r '.event')

# Log to file
echo "$(date -Iseconds) $event: $payload" >> "$LOG_FILE"

# Exit 0 to continue (don't block)
exit 0

4. Make it Executable

chmod +x ~/.moltis/hooks/my-hook/handler.sh

Shell Hook Protocol

Hooks communicate via stdin/stdout and exit codes:

Input

The event payload is passed as JSON on stdin:

{
  "event": "BeforeToolCall",
  "data": {
    "tool": "bash",
    "arguments": {
      "command": "ls -la"
    }
  },
  "session_id": "abc123",
  "timestamp": "2024-01-15T10:30:00Z"
}

Output

Exit Code	Stdout	Result
`0`	(empty)	Continue normally
`0`	`{"action":"modify","data":{...}}`	Replace payload data
`1`	—	Block (stderr = reason)

Example: Modify Tool Arguments

#!/bin/bash
payload=$(cat)
tool=$(echo "$payload" | jq -r '.data.tool')

if [ "$tool" = "bash" ]; then
    # Add safety flag to all bash commands
    modified=$(echo "$payload" | jq '.data.arguments.command = "set -e; " + .data.arguments.command')
    echo "{\"action\":\"modify\",\"data\":$(echo "$modified" | jq '.data')}"
fi

exit 0

Example: Block Dangerous Commands

#!/bin/bash
payload=$(cat)
command=$(echo "$payload" | jq -r '.data.arguments.command // ""')

# Block rm -rf /
if echo "$command" | grep -qE 'rm\s+-rf\s+/'; then
    echo "Blocked dangerous rm command" >&2
    exit 1
fi

exit 0

Hook Discovery

Hooks are discovered from HOOK.md files in these locations (priority order):

Project-local: <workspace>/.moltis/hooks/<name>/HOOK.md
User-global: ~/.moltis/hooks/<name>/HOOK.md

Project-local hooks take precedence over global hooks with the same name.

Configuration in moltis.toml

You can also define hooks directly in the config file:

[[hooks]]
name = "audit-log"
command = "./hooks/audit.sh"
events = ["BeforeToolCall", "AfterToolCall"]
timeout = 5
priority = 100  # Higher = runs first

[[hooks]]
name = "notify-slack"
command = "./hooks/slack-notify.sh"
events = ["SessionEnd"]
env = { SLACK_WEBHOOK_URL = "https://hooks.slack.com/..." }

Eligibility Requirements

Hooks can declare requirements that must be met:

[requires]
os = ["darwin", "linux"]       # Only run on these OSes
bins = ["jq", "curl"]          # Required binaries in PATH
env = ["SLACK_WEBHOOK_URL"]    # Required environment variables

If requirements aren’t met, the hook is skipped (not an error).

Circuit Breaker

Hooks that fail repeatedly are automatically disabled:

Threshold: 5 consecutive failures
Cooldown: 60 seconds
Recovery: Auto-re-enabled after cooldown

This prevents a broken hook from blocking all operations.

CLI Commands

# List all discovered hooks
moltis hooks list

# List only eligible hooks (requirements met)
moltis hooks list --eligible

# Output as JSON
moltis hooks list --json

# Show details for a specific hook
moltis hooks info my-hook

Bundled Hooks

Moltis includes several built-in hooks:

boot-md

Reads BOOT.md from the workspace on GatewayStart and injects it into the agent context.

session-memory

Saves session context when you use the /new command, preserving important information for future sessions.

command-logger

Logs all Command events to a JSONL file for auditing.

Example Hooks

Slack Notification on Session End

#!/bin/bash
# slack-notify.sh
payload=$(cat)
session_id=$(echo "$payload" | jq -r '.session_id')
message_count=$(echo "$payload" | jq -r '.data.message_count')

curl -X POST "$SLACK_WEBHOOK_URL" \
  -H 'Content-Type: application/json' \
  -d "{\"text\":\"Session $session_id ended with $message_count messages\"}"

exit 0

Redact Secrets from Tool Output

#!/bin/bash
# redact-secrets.sh
payload=$(cat)

# Redact common secret patterns
redacted=$(echo "$payload" | sed -E '
  s/sk-[a-zA-Z0-9]{32,}/[REDACTED]/g
  s/ghp_[a-zA-Z0-9]{36}/[REDACTED]/g
  s/password=[^&\s]+/password=[REDACTED]/g
')

echo "{\"action\":\"modify\",\"data\":$(echo "$redacted" | jq '.data')}"
exit 0

Block File Writes Outside Project

#!/bin/bash
# sandbox-writes.sh
payload=$(cat)
tool=$(echo "$payload" | jq -r '.data.tool')

if [ "$tool" = "write_file" ]; then
    path=$(echo "$payload" | jq -r '.data.arguments.path')

    # Only allow writes under current project
    if [[ ! "$path" =~ ^/workspace/ ]]; then
        echo "File writes only allowed in /workspace" >&2
        exit 1
    fi
fi

exit 0

Best Practices

Keep hooks fast — Set appropriate timeouts (default: 5s)
Handle errors gracefully — Use exit 0 unless you want to block
Log for debugging — Write to a log file, not stdout
Test locally first — Pipe sample JSON through your script
Use jq for JSON — It’s reliable and fast for parsing

Local LLM Support

Moltis can run LLM inference locally on your machine without requiring an API key or internet connection. This enables fully offline operation and keeps your conversations private.

Backends

Moltis supports two backends for local inference:

Backend	Format	Platform	GPU Acceleration
GGUF (llama.cpp)	`.gguf` files	macOS, Linux, Windows	Metal (macOS), CUDA (NVIDIA)
MLX	MLX model repos	macOS (Apple Silicon only)	Apple Silicon neural engine

GGUF (llama.cpp)

GGUF is the primary backend, powered by llama.cpp. It supports quantized models in the GGUF format, which significantly reduces memory requirements while maintaining good quality.

Advantages:

Cross-platform (macOS, Linux, Windows)
Wide model compatibility (any GGUF model)
GPU acceleration on both NVIDIA (CUDA) and Apple Silicon (Metal)
Mature and well-tested

MLX

MLX is Apple’s machine learning framework optimized for Apple Silicon. Models from the mlx-community on HuggingFace are specifically optimized for M1/M2/M3/M4 chips.

Advantages:

Native Apple Silicon performance
Efficient unified memory usage
Lower latency on Macs

Requirements:

macOS with Apple Silicon (M1/M2/M3/M4)

Memory Requirements

Models are organized by memory tiers based on your system RAM:

Tier	RAM	Recommended Models
Tiny	4GB	Qwen 2.5 Coder 1.5B, Llama 3.2 1B
Small	8GB	Qwen 2.5 Coder 3B, Llama 3.2 3B
Medium	16GB	Qwen 2.5 Coder 7B, Llama 3.1 8B
Large	32GB+	Qwen 2.5 Coder 14B, DeepSeek Coder V2 Lite

Moltis automatically detects your system memory and suggests appropriate models in the UI.

Configuration

Via Web UI (Recommended)

Navigate to Providers in the sidebar
Click Add Provider
Select Local LLM
Choose a model from the registry or search HuggingFace
Click Configure — the model will download automatically

Via Configuration File

Add to ~/.moltis/moltis.toml:

[providers.local]
model = "qwen2.5-coder-7b-q4_k_m"

For custom GGUF files:

[providers.local]
model = "my-custom-model"
model_path = "/path/to/model.gguf"

Model Storage

Downloaded models are cached in ~/.cache/moltis/models/ by default. This directory can grow large (several GB per model).

To change the cache location:

[providers.local]
cache_dir = "/custom/models/path"

HuggingFace Integration

You can search and download models directly from HuggingFace:

In the Add Provider dialog, click “Search HuggingFace”
Enter a search term (e.g., “qwen coder”)
Select GGUF or MLX backend
Choose a model from the results
The model will be downloaded on first use

Finding GGUF Models

Look for repositories with “GGUF” in the name on HuggingFace:

TheBloke — large collection of quantized models
bartowski — Llama 3.x GGUF models
Qwen — official Qwen GGUF models

Finding MLX Models

MLX models are available from mlx-community:

Pre-converted models optimized for Apple Silicon
Look for models ending in -4bit or -8bit for quantized versions

GPU Acceleration

Metal (macOS)

Metal acceleration is enabled by default on macOS. The number of GPU layers can be configured:

[providers.local]
gpu_layers = 99  # Offload all layers to GPU

CUDA (NVIDIA)

Requires building with the local-llm-cuda feature:

cargo build --release --features local-llm-cuda

Limitations

Local LLM models have some limitations compared to cloud providers:

No tool calling — Local models don’t support function/tool calling. When using a local model, features like file operations, shell commands, and memory search are disabled.
Slower inference — Depending on your hardware, local inference may be significantly slower than cloud APIs.
Quality varies — Smaller quantized models may produce lower quality responses than larger cloud models.
Context window — Local models typically have smaller context windows (8K-32K tokens vs 128K+ for cloud models).

Chat Templates

Different model families use different chat formatting. Moltis automatically detects the correct template for registered models:

ChatML — Qwen, many instruction-tuned models
Llama 3 — Meta’s Llama 3.x family
DeepSeek — DeepSeek Coder models

For custom models, the template is auto-detected from the model metadata when possible.

Troubleshooting

Model fails to load

Check you have enough RAM (see memory tier table above)
Verify the GGUF file isn’t corrupted (re-download if needed)
Ensure the model file matches the expected architecture

Slow inference

Enable GPU acceleration (Metal on macOS, CUDA on Linux)
Try a smaller/more quantized model
Reduce context size in config

Out of memory

Choose a model from a lower memory tier
Close other applications to free RAM
Use a more aggressively quantized model (Q4_K_M vs Q8_0)

Feature Flag

Local LLM support requires the local-llm feature flag at compile time:

cargo build --release --features local-llm

This is enabled by default in release builds.

Sandbox Backends

Moltis runs LLM-generated commands inside containers to protect your host system. The sandbox backend controls which container technology is used.

Backend Selection

Configure in moltis.toml:

[tools.exec.sandbox]
backend = "auto"          # default — picks the best available
# backend = "docker"      # force Docker
# backend = "apple-container"  # force Apple Container (macOS only)

With "auto" (the default), Moltis picks the strongest available backend:

Priority	Backend	Platform	Isolation
1	Apple Container	macOS	VM (Virtualization.framework)
2	Docker	any	Linux namespaces / cgroups
3	none (host)	any	no isolation

Apple Container (recommended on macOS)

Apple Container runs each sandbox in a lightweight virtual machine using Apple’s Virtualization.framework. Every container gets its own kernel, so a kernel exploit inside the sandbox cannot reach the host — unlike Docker, which shares the host kernel.

Install

Download the signed installer from GitHub:

# Download the installer package
gh release download --repo apple/container --pattern "container-installer-signed.pkg" --dir /tmp

# Install (requires admin)
sudo installer -pkg /tmp/container-installer-signed.pkg -target /

# First-time setup — downloads a default Linux kernel
container system start

Alternatively, build from source with brew install container (requires Xcode 26+).

Verify

container --version
# Run a quick test
container run --rm ubuntu echo "hello from VM"

Once installed, restart moltis gateway — the startup banner will show sandbox: apple-container backend.

Docker

Docker is supported on macOS, Linux, and Windows. On macOS it runs inside a Linux VM managed by Docker Desktop, so it is reasonably isolated but adds more overhead than Apple Container.

Install from https://docs.docker.com/get-docker/

No sandbox

If neither runtime is found, commands execute directly on the host. The startup banner will show a warning. This is not recommended for untrusted workloads.

Per-session overrides

The web UI allows toggling sandboxing per session and selecting a custom container image. These overrides persist across gateway restarts.

Resource limits

[tools.exec.sandbox.resource_limits]
memory_limit = "512M"
cpu_quota = 1.0
pids_max = 256

Session State

Moltis provides a per-session key-value store that allows skills, extensions, and the agent itself to persist context across messages within a session.

Overview

Session state is scoped to a (session_key, namespace, key) triple, backed by SQLite. Each entry stores a string value and is automatically timestamped.

The agent accesses state through the session_state tool, which supports three operations: get, set, and list.

Agent Tool

The session_state tool is registered as a built-in tool and available in every session.

Get a value

{
  "op": "get",
  "namespace": "my-skill",
  "key": "last_query"
}

Set a value

{
  "op": "set",
  "namespace": "my-skill",
  "key": "last_query",
  "value": "SELECT * FROM users"
}

List all keys in a namespace

{
  "op": "list",
  "namespace": "my-skill"
}

Namespacing

Every state entry belongs to a namespace. This prevents collisions between different skills or extensions using state in the same session. Use your skill name as the namespace.

Storage

State is stored in the session_state table in the main SQLite database (moltis.db). The migration is in crates/sessions/migrations/20260205120000_session_state.sql.

Tip

State values are strings. To store structured data, serialize to JSON before writing and parse after reading.

Session Branching

Session branching (forking) lets you create an independent copy of a conversation at any point. The new session diverges without affecting the original — useful for exploring alternative approaches, running “what if” scenarios, or preserving a checkpoint before a risky prompt.

Forking from the UI

There are two ways to fork a session in the web UI:

Chat header — click the Fork button in the header bar (next to Delete). This is visible for every session except cron sessions.
Sidebar — hover over a session in the sidebar and click the fork icon that appears in the action buttons.

Both create a new session that copies all messages from the current one and immediately switch you to it.

Forked sessions appear indented under their parent in the sidebar, with a branch icon to distinguish them from top-level sessions. The metadata line shows fork@N where N is the message index at which the fork occurred.

Agent Tool

The agent can also fork programmatically using the branch_session tool:

{
  "at_message": 5,
  "label": "explore-alternative"
}

at_message — the message index to fork at (messages 0..N are copied). If omitted, all messages are copied.
label — optional human-readable label for the new session.

The tool returns the new session key.

RPC Method

The sessions.fork RPC method is the underlying mechanism:

{ "key": "main", "at_message": 5, "label": "my-fork" }

On success the response payload contains { "sessionKey": "session:<uuid>" }.

What Gets Inherited

When forking, the new session inherits:

Inherited	Not inherited
Messages (up to fork point)	Worktree branch
Model selection	Sandbox settings
Project assignment	Channel binding
MCP disabled flag

Parent-Child Relationships

Fork relationships are stored directly on the sessions table:

parent_session_key — the key of the session this was forked from.
fork_point — the message index where the fork occurred.

These fields drive the tree rendering in the sidebar. Sessions with a parent appear indented under it; deeply nested forks indent further.

Deleting a parent

Deleting a parent session does not cascade to its children. Child sessions become top-level sessions — they keep their messages and history but lose their visual nesting in the sidebar.

When you delete a forked session, the UI navigates back to its parent session. If the deleted session had no parent (or the parent no longer exists), it falls back to the next sibling or main.

Independence

A forked session is fully independent after creation. Changes to the parent do not propagate to the fork, and vice versa.

Skill Self-Extension

Moltis can create, update, and delete skills at runtime through agent tools, enabling the system to extend its own capabilities during a conversation.

Overview

Three agent tools manage project-local skills:

Tool	Description
`create_skill`	Write a new `SKILL.md` to `.moltis/skills/<name>/`
`update_skill`	Overwrite an existing skill’s `SKILL.md`
`delete_skill`	Remove a skill directory

Skills created this way are project-local and stored in the working directory’s .moltis/skills/ folder. They become available on the next message automatically thanks to the skill watcher.

Skill Watcher

The skill watcher (crates/skills/src/watcher.rs) monitors skill directories for filesystem changes using debounced notifications. When a SKILL.md file is created, modified, or deleted, the watcher emits a skills.changed event via the WebSocket event bus so the UI can refresh.

Tip

The watcher uses debouncing to avoid firing multiple events for rapid successive edits (e.g. an editor writing a temp file then renaming).

Creating a Skill

The agent can create a skill by calling the create_skill tool:

{
  "name": "summarize-pr",
  "content": "# summarize-pr\n\nSummarize a GitHub pull request...",
  "description": "Summarize GitHub PRs with key changes and review notes"
}

This writes .moltis/skills/summarize-pr/SKILL.md with the provided content. The skill discoverer picks it up on the next message.

Updating a Skill

{
  "name": "summarize-pr",
  "content": "# summarize-pr\n\nUpdated instructions..."
}

Deleting a Skill

{
  "name": "summarize-pr"
}

This removes the entire .moltis/skills/summarize-pr/ directory.

Warning

Deleted skills cannot be recovered. The agent should confirm with the user before deleting a skill.

Mobile PWA and Push Notifications

Moltis can be installed as a Progressive Web App (PWA) on mobile devices, providing a native app-like experience with push notifications.

Installing on Mobile

iOS (Safari)

Open moltis in Safari
Tap the Share button (box with arrow)
Scroll down and tap “Add to Home Screen”
Tap “Add” to confirm

The app will appear on your home screen with the moltis icon.

Android (Chrome)

Open moltis in Chrome
You should see an install banner at the bottom - tap “Install”
Or tap the three-dot menu and select “Install app” or “Add to Home Screen”
Tap “Install” to confirm

The app will appear in your app drawer and home screen.

PWA Features

When installed as a PWA, moltis provides:

Standalone mode: Full-screen experience without browser UI
Offline support: Previously loaded content remains accessible
Fast loading: Assets are cached locally
Home screen icon: Quick access from your device’s home screen
Safe area support: Proper spacing for notched devices (iPhone X+)

Push Notifications

Push notifications allow you to receive alerts when the LLM responds, even when you’re not actively viewing the app.

Enabling Push Notifications

Open the moltis app (must be installed as PWA on Safari/iOS)
Go to Settings > Notifications
Click Enable to subscribe to push notifications
When prompted, allow notification permissions

Safari/iOS Note: Push notifications only work when the app is installed as a PWA. If you see “Installation required”, add moltis to your Dock first:

macOS: File → Add to Dock
iOS: Share → Add to Home Screen

Managing Subscriptions

The Settings > Notifications page shows all subscribed devices:

Device name: Parsed from user agent (e.g., “Safari on macOS”, “iPhone”)
IP address: Client IP at subscription time (supports proxies via X-Forwarded-For)
Subscription date: When the device subscribed

You can remove any subscription by clicking the Remove button. This works from any device - useful for revoking access to old devices.

Subscription changes are broadcast in real-time via WebSocket, so all connected clients see updates immediately.

How It Works

Moltis uses the Web Push API with VAPID (Voluntary Application Server Identification) keys:

VAPID Keys: On first run, the server generates a P-256 ECDSA key pair
Subscription: The browser creates a push subscription using the server’s public key
Registration: The subscription details are sent to the server and stored
Notification: When you need to be notified, the server encrypts and sends a push message

Push API Routes

The gateway exposes these API endpoints for push notifications:

Endpoint	Method	Description
`/api/push/vapid-key`	GET	Get the VAPID public key for subscription
`/api/push/subscribe`	POST	Register a push subscription
`/api/push/unsubscribe`	POST	Remove a push subscription
`/api/push/status`	GET	Get push service status and subscription list

{
  "endpoint": "https://fcm.googleapis.com/fcm/send/...",
  "keys": {
    "p256dh": "base64url-encoded-key",
    "auth": "base64url-encoded-auth"
  }
}

Status Response

{
  "enabled": true,
  "subscription_count": 2,
  "subscriptions": [
    {
      "endpoint": "https://fcm.googleapis.com/...",
      "device": "Safari on macOS",
      "ip": "192.168.1.100",
      "created_at": "2025-02-05T23:30:00Z"
    }
  ]
}

Notification Payload

Push notifications include:

{
  "title": "moltis",
  "body": "New response available",
  "url": "/chats",
  "sessionKey": "session-id"
}

Clicking a notification will open or focus the app and navigate to the relevant chat.

Configuration

Feature Flag

Push notifications are controlled by the push-notifications feature flag, which is enabled by default. To disable:

# In your Cargo.toml or when building
[dependencies]
moltis-gateway = { default-features = false, features = ["web-ui", "tls"] }

Or build without the feature:

cargo build --no-default-features --features web-ui,tls,tailscale,file-watcher

Data Storage

Push notification data is stored in push.json in the data directory:

VAPID keys: Generated once and reused
Subscriptions: List of all registered browser subscriptions

The VAPID keys are persisted so subscriptions remain valid across restarts.

Mobile UI Considerations

The mobile interface adapts for smaller screens:

Navigation drawer: The sidebar becomes a slide-out drawer on mobile
Sessions panel: Displayed as a bottom sheet that can be swiped
Touch targets: Minimum 44px touch targets for accessibility
Safe areas: Proper insets for devices with notches or home indicators

Responsive Breakpoints

Mobile: < 768px width (drawer navigation)
Desktop: ≥ 768px width (sidebar navigation)

Browser Support

Feature	Chrome	Safari	Firefox	Edge
PWA Install	✅	✅ (iOS)	❌	✅
Push Notifications	✅	✅ (iOS 16.4+)	✅	✅
Service Worker	✅	✅	✅	✅
Offline Support	✅	✅	✅	✅

Note: iOS push notifications require iOS 16.4 or later and the app must be installed as a PWA.

Troubleshooting

Notifications Not Working

Check permissions: Ensure notifications are allowed in browser/OS settings
Check subscription: Go to Settings > Notifications to see if your device is listed
Check server logs: Look for push: prefixed log messages for delivery status
Safari/iOS specific:
- Must be installed as PWA (Add to Dock/Home Screen)
- iOS requires version 16.4 or later
- The Enable button is disabled until installed as PWA
Behind a proxy: Ensure your proxy forwards X-Forwarded-For or X-Real-IP headers

PWA Not Installing

HTTPS required: PWAs require a secure connection (or localhost)
Valid manifest: Ensure /manifest.json loads correctly
Service worker: Check that /sw.js registers without errors
Clear cache: Try clearing browser cache and reloading

Service Worker Issues

Clear the service worker registration:

Open browser DevTools
Go to Application > Service Workers
Click “Unregister” on the moltis service worker
Reload the page

Security Architecture

Moltis is designed with a defense-in-depth security model. This document explains the key security features and provides guidance for production deployments.

Overview

Moltis runs AI agents that can execute code and interact with external systems. This power requires multiple layers of protection:

Human-in-the-loop approval for dangerous commands
Sandbox isolation for command execution
Channel authorization for external integrations
Rate limiting to prevent resource abuse
Scope-based access control for API authorization

Command Execution Approval

By default, Moltis requires explicit user approval before executing potentially dangerous commands. This “human-in-the-loop” design ensures the AI cannot take destructive actions without consent.

How It Works

When the agent wants to run a command:

The command is analyzed against approval policies
If approval is required, the user sees a prompt in the UI
The user can approve, deny, or modify the command
Only approved commands execute

Approval Policies

Configure approval behavior in moltis.toml:

[tools.exec]
approval_mode = "always"  # always require approval
# approval_mode = "smart" # auto-approve safe commands (default)
# approval_mode = "never" # dangerous: never require approval

Recommendation: Keep approval_mode = "smart" (the default) for most use cases. Only use "never" in fully automated, sandboxed environments.

Sandbox Isolation

Commands execute inside isolated containers (Docker or Apple Container) by default. This protects your host system from:

Accidental file deletion or modification
Malicious code execution
Resource exhaustion (memory, CPU, disk)

See sandbox.md for backend configuration.

Resource Limits

[tools.exec.sandbox.resource_limits]
memory_limit = "512M"
cpu_quota = 1.0
pids_max = 256

Network Isolation

Sandbox containers have limited network access by default. Outbound connections are allowed but the sandbox cannot bind to host ports.

Channel Authorization

Channels (Telegram, Slack, etc.) allow external parties to interact with your Moltis agent. This requires careful access control.

Sender Allowlisting

When a new sender contacts the agent through a channel, they are placed in a pending queue. You must explicitly approve or deny each sender before they can interact with the agent.

UI: Settings > Channels > Pending Senders

Per-Channel Permissions

Each channel can have different permission levels:

Read-only: Sender can ask questions, agent responds
Execute: Sender can trigger actions (with approval still required)
Admin: Full access including configuration changes

Channel Isolation

Channels run in isolated sessions by default. A malicious message from one channel cannot affect another channel’s session or the main UI session.

Cron Job Security

Scheduled tasks (cron jobs) can run agent turns automatically. Security considerations:

Rate Limiting

To prevent prompt injection attacks from rapidly creating many cron jobs:

[cron]
rate_limit_max = 10           # max jobs per window
rate_limit_window_secs = 60   # window duration (1 minute)

This limits job creation to 10 per minute by default. System jobs (like heartbeat) bypass this limit.

Job Notifications

When cron jobs are created, updated, or removed, Moltis broadcasts events:

cron.job.created - A new job was created
cron.job.updated - An existing job was modified
cron.job.removed - A job was deleted

Monitor these events to detect suspicious automated job creation.

Sandbox for Cron Jobs

Cron job execution uses sandbox isolation by default:

# Per-job configuration
[cron.job.sandbox]
enabled = true              # run in sandbox (default)
# image = "custom:latest"   # optional custom image

Identity Protection

The agent’s identity (name, personality “soul”) is stored in moltis.toml. Modifying identity requires the operator.write scope, not just operator.read.

This prevents prompt injection attacks from subtly modifying the agent’s personality to make it more compliant with malicious requests.

API Authorization

The gateway API uses role-based access control with scopes:

Scope	Permissions
`operator.read`	View status, list jobs, read history
`operator.write`	Send messages, create jobs, modify configuration
`operator.admin`	All permissions (includes all other scopes)
`operator.approvals`	Handle command approval requests
`operator.pairing`	Manage device/node pairing

API Keys

API keys authenticate external tools and scripts connecting to Moltis. Keys can have full access (all scopes) or be restricted to specific scopes for defense-in-depth.

Creating API Keys

Web UI: Settings > Security > API Keys

Enter a label describing the key’s purpose
Choose “Full access” or select specific scopes
Click “Generate key”
Copy the key immediately — it’s only shown once

CLI:

# Full access key
moltis auth create-api-key --label "CI pipeline"

# Scoped key (comma-separated scopes)
moltis auth create-api-key --label "Monitor" --scopes "operator.read"
moltis auth create-api-key --label "Automation" --scopes "operator.read,operator.write"

Using API Keys

Pass the key in the connect handshake over WebSocket:

{
  "method": "connect",
  "params": {
    "client": { "id": "my-tool", "version": "1.0.0" },
    "auth": { "api_key": "mk_abc123..." }
  }
}

Or use Bearer authentication for REST API calls:

Authorization: Bearer mk_abc123...

Scope Recommendations

Use Case	Recommended Scopes
Read-only monitoring	`operator.read`
Automated workflows	`operator.read`, `operator.write`
Approval handling	`operator.read`, `operator.approvals`
Full automation	Full access (no scope restrictions)

Best practice: Use the minimum necessary scopes. If a key only needs to read status and logs, don’t grant operator.write.

Backward Compatibility

Existing API keys (created before scopes were added) have full access. Newly created keys without explicit scopes also have full access.

Network Security

TLS Encryption

HTTPS is enabled by default with auto-generated certificates:

[tls]
enabled = true
auto_generate = true

For production, use certificates from a trusted CA or configure custom certificates.

Origin Validation

WebSocket connections validate the Origin header to prevent cross-site WebSocket hijacking (CSWSH). Connections from untrusted origins are rejected.

SSRF Protection

The web_fetch tool resolves DNS and blocks requests to private IP ranges (loopback, RFC 1918, link-local, CGNAT). This prevents server-side request forgery attacks.

Production Recommendations

1. Enable Authentication

By default, Moltis requires a password when accessed from non-localhost:

[auth]
disabled = false  # keep this false in production

2. Use Sandbox Isolation

Always run with sandbox enabled in production:

[tools.exec.sandbox]
enabled = true
backend = "auto"  # uses strongest available

3. Limit Rate Limits

Tighten rate limits for untrusted environments:

[cron]
rate_limit_max = 5
rate_limit_window_secs = 300  # 5 per 5 minutes

4. Review Channel Senders

Regularly audit approved senders and revoke access for unknown parties.

5. Monitor Events

Watch for these suspicious patterns:

Rapid cron job creation
Identity modification attempts
Unusual command patterns in approval requests
New channel senders from unexpected sources

6. Network Segmentation

Run Moltis on a private network or behind a reverse proxy with:

IP allowlisting
Rate limiting
Web Application Firewall (WAF) rules

7. Keep Software Updated

Subscribe to security advisories and update promptly when vulnerabilities are disclosed.

Reporting Security Issues

Report security vulnerabilities privately to the maintainers. Do not open public issues for security bugs.

See the repository’s SECURITY.md for contact information.

Running Moltis in Docker

Moltis is available as a multi-architecture Docker image supporting both linux/amd64 and linux/arm64. The image is published to GitHub Container Registry on every release.

Quick Start

docker run -d \
  --name moltis \
  -p 13131:13131 \
  -v moltis-config:/home/moltis/.config/moltis \
  -v moltis-data:/home/moltis/.moltis \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

Open http://localhost:13131 in your browser and configure your LLM provider to start chatting.

Note

When accessing from localhost, no authentication is required. If you access Moltis from a different machine (e.g., over the network), a setup code is printed to the container logs for authentication setup:

```bash docker logs moltis ```

Volume Mounts

Moltis uses two directories that should be persisted:

Path	Contents
`/home/moltis/.config/moltis`	Configuration files: `moltis.toml`, `credentials.json`, `mcp-servers.json`
`/home/moltis/.moltis`	Runtime data: databases, sessions, memory files, logs

You can use named volumes (as shown above) or bind mounts to local directories for easier access to configuration files:

docker run -d \
  --name moltis \
  -p 13131:13131 \
  -v ./config:/home/moltis/.config/moltis \
  -v ./data:/home/moltis/.moltis \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

With bind mounts, you can edit config/moltis.toml directly on the host.

Docker Socket (Sandbox Execution)

Moltis runs LLM-generated shell commands inside isolated containers for security. When Moltis itself runs in a container, it needs access to the host’s container runtime to create these sandbox containers.

Without the socket mount, sandbox execution is disabled. The agent will still work for chat-only interactions, but any tool that runs shell commands will fail.

# Required for sandbox execution
-v /var/run/docker.sock:/var/run/docker.sock

Security Consideration

Mounting the Docker socket gives the container full access to the Docker daemon. This is equivalent to root access on the host for practical purposes. Only run Moltis containers from trusted sources (official images from ghcr.io/penso/moltis).

If you cannot mount the Docker socket, Moltis will run in “no sandbox” mode — commands execute directly inside the Moltis container itself, which provides no isolation.

Docker Compose

See examples/docker-compose.yml for a complete example:

services:
  moltis:
    image: ghcr.io/penso/moltis:latest
    container_name: moltis
    restart: unless-stopped
    ports:
      - "13131:13131"
    volumes:
      - ./config:/home/moltis/.config/moltis
      - ./data:/home/moltis/.moltis
      - /var/run/docker.sock:/var/run/docker.sock

Start with:

docker compose up -d
docker compose logs -f moltis  # watch for startup messages

Podman Support

Moltis works with Podman using its Docker-compatible API. Mount the Podman socket instead of the Docker socket:

# Podman rootless
podman run -d \
  --name moltis \
  -p 13131:13131 \
  -v moltis-config:/home/moltis/.config/moltis \
  -v moltis-data:/home/moltis/.moltis \
  -v /run/user/$(id -u)/podman/podman.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

# Podman rootful
podman run -d \
  --name moltis \
  -p 13131:13131 \
  -v moltis-config:/home/moltis/.config/moltis \
  -v moltis-data:/home/moltis/.moltis \
  -v /run/podman/podman.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

You may need to enable the Podman socket service first:

# Rootless
systemctl --user enable --now podman.socket

# Rootful
sudo systemctl enable --now podman.socket

Environment Variables

Variable	Description
`MOLTIS_CONFIG_DIR`	Override config directory (default: `~/.config/moltis`)
`MOLTIS_DATA_DIR`	Override data directory (default: `~/.moltis`)

Example:

docker run -d \
  --name moltis \
  -p 13131:13131 \
  -e MOLTIS_CONFIG_DIR=/config \
  -e MOLTIS_DATA_DIR=/data \
  -v ./config:/config \
  -v ./data:/data \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

Building Locally

To build the Docker image from source:

# Single architecture (current platform)
docker build -t moltis:local .

# Multi-architecture (requires buildx)
docker buildx build --platform linux/amd64,linux/arm64 -t moltis:local .

OrbStack

OrbStack on macOS works identically to Docker — use the same socket path (/var/run/docker.sock). OrbStack’s lightweight Linux VM provides good isolation with lower resource usage than Docker Desktop.

Troubleshooting

“Cannot connect to Docker daemon”

The Docker socket is not mounted or the Moltis user doesn’t have permission to access it. Verify:

docker exec moltis ls -la /var/run/docker.sock

Setup code not appearing in logs (for network access)

The setup code only appears when accessing from a non-localhost address. If you’re accessing from the same machine via localhost, no setup code is needed. For network access, wait a few seconds for the gateway to start, then check logs:

docker logs moltis 2>&1 | grep -i setup

Permission denied on bind mounts

When using bind mounts, ensure the directories exist and are writable:

mkdir -p ./config ./data
chmod 755 ./config ./data

The container runs as user moltis (UID 1000). If you see permission errors, you may need to adjust ownership:

sudo chown -R 1000:1000 ./config ./data

Streaming Architecture

This document explains how streaming responses work in Moltis, from the LLM provider through to the web UI.

Overview

Moltis supports real-time token streaming for LLM responses, providing a much better user experience than waiting for the complete response. Streaming works even when tools are enabled, allowing users to see text as it arrives while tool calls are accumulated and executed.

Components

1. StreamEvent Enum (`crates/agents/src/model.rs`)

The StreamEvent enum defines all events that can occur during a streaming LLM response:

#![allow(unused)]
fn main() {
pub enum StreamEvent {
    /// Text content delta - a chunk of text from the LLM.
    Delta(String),

    /// A tool call has started (for providers with native tool support).
    ToolCallStart { id: String, name: String, index: usize },

    /// Streaming delta for tool call arguments (JSON fragment).
    ToolCallArgumentsDelta { index: usize, delta: String },

    /// A tool call's arguments are complete.
    ToolCallComplete { index: usize },

    /// Stream completed successfully with token usage.
    Done(Usage),

    /// An error occurred.
    Error(String),
}
}

2. LlmProvider Trait (`crates/agents/src/model.rs`)

The LlmProvider trait defines two streaming methods:

stream() - Basic streaming without tool support
stream_with_tools() - Streaming with tool schemas passed to the API

Providers that support streaming with tools (like Anthropic) override stream_with_tools(). Others fall back to stream() which ignores the tools parameter.

3. Anthropic Provider (`crates/agents/src/providers/anthropic.rs`)

The Anthropic provider implements streaming by:

Making a POST request to /v1/messages with "stream": true
Reading Server-Sent Events (SSE) from the response
Parsing events and yielding appropriate StreamEvent variants:

SSE Event Type	StreamEvent
`content_block_start` (text)	(none, just tracking)
`content_block_start` (tool_use)	`ToolCallStart`
`content_block_delta` (text_delta)	`Delta`
`content_block_delta` (input_json_delta)	`ToolCallArgumentsDelta`
`content_block_stop`	`ToolCallComplete` (for tool blocks)
`message_delta`	(usage tracking)
`message_stop`	`Done`
`error`	`Error`

4. Agent Runner (`crates/agents/src/runner.rs`)

The run_agent_loop_streaming() function orchestrates the streaming agent loop:

┌─────────────────────────────────────────────────────────┐
│                    Agent Loop                           │
│                                                         │
│  1. Call provider.stream_with_tools()                   │
│                                                         │
│  2. While stream has events:                            │
│     ├─ Delta(text) → emit RunnerEvent::TextDelta        │
│     ├─ ToolCallStart → accumulate tool call             │
│     ├─ ToolCallArgumentsDelta → accumulate args         │
│     ├─ ToolCallComplete → finalize args                 │
│     ├─ Done → record usage                              │
│     └─ Error → return error                             │
│                                                         │
│  3. If no tool calls → return accumulated text          │
│                                                         │
│  4. Execute tool calls concurrently                     │
│     ├─ Emit ToolCallStart events                        │
│     ├─ Run tools in parallel                            │
│     └─ Emit ToolCallEnd events                          │
│                                                         │
│  5. Append tool results to messages                     │
│                                                         │
│  6. Loop back to step 1                                 │
└─────────────────────────────────────────────────────────┘

5. Gateway (`crates/gateway/src/chat.rs`)

The gateway’s run_with_tools() function:

Sets up an event callback that broadcasts RunnerEvents via WebSocket
Calls run_agent_loop_streaming()
Broadcasts events to connected clients as JSON frames

Event types broadcast to the UI:

RunnerEvent	WebSocket State
`Thinking`	`thinking`
`ThinkingDone`	`thinking_done`
`TextDelta(text)`	`delta` with `text` field
`ToolCallStart`	`tool_call_start`
`ToolCallEnd`	`tool_call_end`
`Iteration(n)`	`iteration`

6. Frontend (`crates/gateway/src/assets/js/`)

The JavaScript frontend handles streaming via WebSocket:

websocket.js - Receives WebSocket frames and dispatches to handlers
events.js - Event bus for distributing events to components
state.js - Manages streaming state (streamText, streamEl)

When a delta event arrives:

function handleChatDelta(p, isActive, isChatPage) {
  if (!(p.text && isActive && isChatPage)) return;
  removeThinking();
  if (!S.streamEl) {
    S.setStreamText("");
    S.setStreamEl(document.createElement("div"));
    S.streamEl.className = "msg assistant";
    S.chatMsgBox.appendChild(S.streamEl);
  }
  S.setStreamText(S.streamText + p.text);
  setSafeMarkdownHtml(S.streamEl, S.streamText);
  S.chatMsgBox.scrollTop = S.chatMsgBox.scrollHeight;
}

Data Flow

┌──────────────┐     SSE      ┌──────────────┐   StreamEvent   ┌──────────────┐
│   Anthropic  │─────────────▶│   Provider   │────────────────▶│    Runner    │
│     API      │              │              │                 │              │
└──────────────┘              └──────────────┘                 └──────┬───────┘
                                                                      │
                                                               RunnerEvent
                                                                      │
                                                                      ▼
┌──────────────┐   WebSocket  ┌──────────────┐    Callback     ┌──────────────┐
│   Browser    │◀─────────────│   Gateway    │◀────────────────│   Callback   │
│              │              │              │                 │   (on_event) │
└──────────────┘              └──────────────┘                 └──────────────┘

Adding Streaming to New Providers

To add streaming support for a new LLM provider:

Implement the stream() method (basic streaming)
If the provider supports tools in streaming mode, override stream_with_tools()
Parse the provider’s streaming format and yield appropriate StreamEvent variants
Handle errors gracefully with StreamEvent::Error
Always emit StreamEvent::Done with usage statistics when complete

Example skeleton:

#![allow(unused)]
fn main() {
fn stream_with_tools(
    &self,
    messages: Vec<serde_json::Value>,
    tools: Vec<serde_json::Value>,
) -> Pin<Box<dyn Stream<Item = StreamEvent> + Send + '_>> {
    Box::pin(async_stream::stream! {
        // Make streaming request to provider API
        let resp = self.client.post(...)
            .json(&body)
            .send()
            .await?;

        // Read SSE or streaming response
        let mut byte_stream = resp.bytes_stream();

        while let Some(chunk) = byte_stream.next().await {
            // Parse chunk and yield events
            match parse_event(&chunk) {
                TextDelta(text) => yield StreamEvent::Delta(text),
                ToolStart { id, name, idx } => {
                    yield StreamEvent::ToolCallStart { id, name, index: idx }
                }
                // ... handle other event types
            }
        }

        yield StreamEvent::Done(usage);
    })
}
}

Performance Considerations

Unbounded channels: WebSocket send channels are unbounded, so slow clients can accumulate messages in memory
Markdown re-rendering: The frontend re-renders full markdown on each delta, which is O(n) work per delta. For very long responses, this can cause UI lag
Concurrent tool execution: Multiple tool calls are executed in parallel using futures::join_all(), improving throughput when the LLM requests several tools at once

SQLite Database Migrations

Moltis uses sqlx for database access and its built-in migration system for schema management. Each crate owns its migrations, keeping schema definitions close to the code that uses them.

Architecture

Each crate that uses SQLite has its own migrations/ directory and exposes a run_migrations() function. The gateway orchestrates running all migrations at startup in the correct dependency order.

crates/
├── projects/
│   ├── migrations/
│   │   └── 20240205100000_init.sql   # projects table
│   └── src/lib.rs                     # run_migrations()
├── sessions/
│   ├── migrations/
│   │   └── 20240205100001_init.sql   # sessions, channel_sessions
│   └── src/lib.rs                     # run_migrations()
├── cron/
│   ├── migrations/
│   │   └── 20240205100002_init.sql   # cron_jobs, cron_runs
│   └── src/lib.rs                     # run_migrations()
├── gateway/
│   ├── migrations/
│   │   └── 20240205100003_init.sql   # auth, message_log, channels
│   └── src/server.rs                  # orchestrates moltis.db migrations
└── memory/
    ├── migrations/
    │   └── 20240205100004_init.sql   # files, chunks, embedding_cache, FTS
    └── src/lib.rs                     # run_migrations() (separate memory.db)

How It Works

Migration Ownership

Each crate is autonomous and owns its schema:

Crate	Database	Tables	Migration File
`moltis-projects`	`moltis.db`	`projects`	`20240205100000_init.sql`
`moltis-sessions`	`moltis.db`	`sessions`, `channel_sessions`	`20240205100001_init.sql`
`moltis-cron`	`moltis.db`	`cron_jobs`, `cron_runs`	`20240205100002_init.sql`
`moltis-gateway`	`moltis.db`	`auth_*`, `passkeys`, `api_keys`, `env_variables`, `message_log`, `channels`	`20240205100003_init.sql`
`moltis-memory`	`memory.db`	`files`, `chunks`, `embedding_cache`, `chunks_fts`	`20240205100004_init.sql`

Startup Sequence

The gateway runs migrations in dependency order:

#![allow(unused)]
fn main() {
// server.rs
moltis_projects::run_migrations(&db_pool).await?;   // 1. projects first
moltis_sessions::run_migrations(&db_pool).await?;   // 2. sessions (FK → projects)
moltis_cron::run_migrations(&db_pool).await?;       // 3. cron (independent)
sqlx::migrate!("./migrations").run(&db_pool).await?; // 4. gateway tables
}

Sessions depends on projects due to a foreign key (sessions.project_id references projects.id), so projects must migrate first.

Version Tracking

sqlx tracks applied migrations in the _sqlx_migrations table:

SELECT version, description, installed_on, success FROM _sqlx_migrations;

Migrations are identified by their timestamp prefix (e.g., 20240205100000), which must be globally unique across all crates.

Database Files

Database	Location	Crates
`moltis.db`	`~/.moltis/moltis.db`	projects, sessions, cron, gateway
`memory.db`	`~/.moltis/memory.db`	memory (separate, managed internally)

Adding New Migrations

Adding a Column to an Existing Table

Create a new migration file in the owning crate:

# Example: adding tags to sessions
touch crates/sessions/migrations/20240301120000_add_tags.sql

Write the migration SQL:

-- 20240301120000_add_tags.sql
ALTER TABLE sessions ADD COLUMN tags TEXT;
CREATE INDEX IF NOT EXISTS idx_sessions_tags ON sessions(tags);

Rebuild to embed the migration:

cargo build

Adding a New Table to an Existing Crate

Create the migration file with a new timestamp:

touch crates/sessions/migrations/20240302100000_session_bookmarks.sql

Write the CREATE TABLE statement:

-- 20240302100000_session_bookmarks.sql
CREATE TABLE IF NOT EXISTS session_bookmarks (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,
    session_key TEXT NOT NULL,
    name       TEXT NOT NULL,
    message_id INTEGER NOT NULL,
    created_at INTEGER NOT NULL
);

Adding Tables to a New Crate

Create the migrations directory:

mkdir -p crates/new-feature/migrations

Create the migration file with a globally unique timestamp:

touch crates/new-feature/migrations/20240401100000_init.sql

Add run_migrations() to the crate’s lib.rs:

#![allow(unused)]
fn main() {
pub async fn run_migrations(pool: &sqlx::SqlitePool) -> anyhow::Result<()> {
    sqlx::migrate!("./migrations").run(pool).await?;
    Ok(())
}
}

Call it from server.rs in the appropriate order:

#![allow(unused)]
fn main() {
moltis_new_feature::run_migrations(&db_pool).await?;
}

Timestamp Convention

Use YYYYMMDDHHMMSS format for migration filenames:

YYYY - 4-digit year
MM - 2-digit month
DD - 2-digit day
HH - 2-digit hour (24h)
MM - 2-digit minute
SS - 2-digit second

This ensures global uniqueness across crates. When adding migrations, use the current timestamp to avoid collisions.

SQLite Limitations

ALTER TABLE

SQLite has limited ALTER TABLE support:

ADD COLUMN: Supported ✓
DROP COLUMN: SQLite 3.35+ only
Rename column: Requires table recreation
Change column type: Requires table recreation

For complex schema changes, use the table recreation pattern:

-- Create new table with desired schema
CREATE TABLE sessions_new (
    -- new schema
);

-- Copy data (map old columns to new)
INSERT INTO sessions_new SELECT ... FROM sessions;

-- Swap tables
DROP TABLE sessions;
ALTER TABLE sessions_new RENAME TO sessions;

-- Recreate indexes
CREATE INDEX idx_sessions_created_at ON sessions(created_at);

Foreign Keys

SQLite foreign keys are checked at insert/update time, not migration time. Ensure migrations run in dependency order (parent table first).

Testing

Unit tests use in-memory databases with the crate’s init() method:

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_session_operations() {
    let pool = SqlitePool::connect("sqlite::memory:").await.unwrap();

    // Create schema for tests (init() retained for this purpose)
    SqliteSessionMetadata::init(&pool).await.unwrap();

    let meta = SqliteSessionMetadata::new(pool);
    // ... test code
}
}

The init() methods are retained (marked #[doc(hidden)]) specifically for tests. In production, migrations handle schema creation.

Troubleshooting

“failed to run migrations”

Check file permissions on ~/.moltis/
Ensure the database file isn’t locked by another process
Check for syntax errors in migration SQL files

Migration Order Issues

If you see foreign key errors, verify the migration order in server.rs. Parent tables must be created before child tables with FK references.

Checking Migration Status

sqlite3 ~/.moltis/moltis.db "SELECT version, description, success FROM _sqlx_migrations ORDER BY version"

Resetting Migrations (Development Only)

# Backup first!
rm ~/.moltis/moltis.db
cargo run  # Creates fresh database with all migrations

Best Practices

DO

Use timestamp-based version numbers for global uniqueness
Keep each crate’s migrations in its own directory
Use IF NOT EXISTS for idempotent initial migrations
Test migrations on a copy of production data before deploying
Keep migrations small and focused

DON’T

Modify existing migration files after deployment
Reuse timestamps across crates
Put multiple crates’ tables in one migration file
Skip the dependency order in server.rs

Metrics and Tracing

Moltis includes comprehensive observability support through Prometheus metrics and tracing integration. This document explains how to enable, configure, and use these features.

Overview

The metrics system is built on the metrics crate facade, which provides a unified interface similar to the log crate. When the prometheus feature is enabled, metrics are exported in Prometheus text format for scraping by Grafana, Prometheus, or other monitoring tools.

All metrics are feature-gated — they add zero overhead when disabled.

Feature Flags

Metrics are controlled by two feature flags:

Feature	Description	Default
`metrics`	Enables metrics collection and the `/api/metrics` JSON API	Enabled
`prometheus`	Enables the `/metrics` Prometheus endpoint (requires `metrics`)	Enabled

Compile-Time Configuration

# Enable only metrics collection (no Prometheus endpoint)
moltis-gateway = { version = "0.1", features = ["metrics"] }

# Enable metrics with Prometheus export (default)
moltis-gateway = { version = "0.1", features = ["metrics", "prometheus"] }

# Enable metrics for specific crates
moltis-agents = { version = "0.1", features = ["metrics"] }
moltis-cron = { version = "0.1", features = ["metrics"] }

To build without metrics entirely:

cargo build --release --no-default-features --features "file-watcher,tailscale,tls,web-ui"

Prometheus Endpoint

When the prometheus feature is enabled, the gateway exposes a /metrics endpoint:

GET http://localhost:18789/metrics

This endpoint is unauthenticated to allow Prometheus scrapers to access it. It returns metrics in Prometheus text format:

# HELP moltis_http_requests_total Total number of HTTP requests handled
# TYPE moltis_http_requests_total counter
moltis_http_requests_total{method="GET",status="200",endpoint="/api/chat"} 42

# HELP moltis_llm_completion_duration_seconds Duration of LLM completion requests
# TYPE moltis_llm_completion_duration_seconds histogram
moltis_llm_completion_duration_seconds_bucket{provider="anthropic",model="claude-3-opus",le="1.0"} 5

Grafana Integration

To scrape metrics with Prometheus and visualize in Grafana:

Add moltis to your prometheus.yml:

scrape_configs:
  - job_name: 'moltis'
    static_configs:
      - targets: ['localhost:18789']
    metrics_path: /metrics
    scrape_interval: 15s

Import or create Grafana dashboards using the moltis_* metrics.

JSON API Endpoints

For the web UI dashboard and programmatic access, authenticated JSON endpoints are available:

Endpoint	Description
`GET /api/metrics`	Full metrics snapshot with aggregates and per-provider breakdown
`GET /api/metrics/summary`	Lightweight counts for navigation badges
`GET /api/metrics/history`	Time-series data points for charts (last hour, 10s intervals)

History Endpoint

The /api/metrics/history endpoint returns historical metrics data for rendering time-series charts:

{
  "enabled": true,
  "interval_seconds": 10,
  "max_points": 60480,
  "points": [
    {
      "timestamp": 1706832000000,
      "llm_completions": 42,
      "llm_input_tokens": 15000,
      "llm_output_tokens": 8000,
      "http_requests": 150,
      "ws_active": 3,
      "tool_executions": 25,
      "mcp_calls": 12,
      "active_sessions": 2
    }
  ]
}

Metrics Persistence

Metrics history is persisted to SQLite, so historical data survives server restarts. The database is stored at ~/.moltis/metrics.db (or the configured data directory).

Key features:

7-day retention: History is kept for 7 days (60,480 data points at 10-second intervals)
Automatic cleanup: Old data is automatically removed hourly
Startup recovery: History is loaded from the database when the server starts

The storage backend uses a trait-based design (MetricsStore), allowing alternative implementations (e.g., TimescaleDB) for larger deployments.

Storage Architecture

#![allow(unused)]
fn main() {
// The MetricsStore trait defines the storage interface
#[async_trait]
pub trait MetricsStore: Send + Sync {
    async fn save_point(&self, point: &MetricsHistoryPoint) -> Result<()>;
    async fn load_history(&self, since: u64, limit: usize) -> Result<Vec<MetricsHistoryPoint>>;
    async fn cleanup_before(&self, before: u64) -> Result<u64>;
    async fn latest_point(&self) -> Result<Option<MetricsHistoryPoint>>;
}
}

The default SqliteMetricsStore implementation stores data in a single table with an index on the timestamp column for efficient range queries.

Web UI Dashboard

The gateway includes a built-in metrics dashboard at /monitoring in the web UI. This page displays:

Overview Tab:

System metrics (uptime, connected clients, active sessions)
LLM usage (completions, tokens, cache statistics)
Tool execution statistics
MCP server status
Provider breakdown table
Prometheus endpoint (with copy button)

Charts Tab:

Token usage over time (input/output)
HTTP requests and LLM completions
WebSocket connections and active sessions
Tool executions and MCP calls

The dashboard uses uPlot for lightweight, high-performance time-series charts. Data updates every 10 seconds for current metrics and every 30 seconds for history.

Available Metrics

HTTP Metrics

Metric	Type	Labels	Description
`moltis_http_requests_total`	Counter	method, status, endpoint	Total HTTP requests
`moltis_http_request_duration_seconds`	Histogram	method, status, endpoint	Request latency
`moltis_http_requests_in_flight`	Gauge	—	Currently processing requests

LLM/Agent Metrics

Metric	Type	Labels	Description
`moltis_llm_completions_total`	Counter	provider, model	Total completions requested
`moltis_llm_completion_duration_seconds`	Histogram	provider, model	Completion latency
`moltis_llm_input_tokens_total`	Counter	provider, model	Input tokens processed
`moltis_llm_output_tokens_total`	Counter	provider, model	Output tokens generated
`moltis_llm_completion_errors_total`	Counter	provider, model, error_type	Completion failures
`moltis_llm_time_to_first_token_seconds`	Histogram	provider, model	Streaming TTFT

Provider Aliases

When you have multiple instances of the same provider type (e.g., separate API keys for work and personal use), you can use the alias configuration option to differentiate them in metrics:

[providers.anthropic]
api_key = "sk-work-..."
alias = "anthropic-work"

# Note: You would need separate config sections for multiple instances
# of the same provider. This is a placeholder for future functionality.

The alias appears in the provider label of all LLM metrics:

moltis_llm_input_tokens_total{provider="anthropic-work", model="claude-3-opus"} 5000
moltis_llm_input_tokens_total{provider="anthropic-personal", model="claude-3-opus"} 3000

This allows you to:

Track token usage separately for billing purposes
Create separate Grafana dashboards per provider instance
Monitor rate limits and quotas independently

MCP (Model Context Protocol) Metrics

Metric	Type	Labels	Description
`moltis_mcp_tool_calls_total`	Counter	server, tool	Tool invocations
`moltis_mcp_tool_call_duration_seconds`	Histogram	server, tool	Tool call latency
`moltis_mcp_tool_call_errors_total`	Counter	server, tool, error_type	Tool call failures
`moltis_mcp_servers_connected`	Gauge	—	Active MCP server connections

Tool Execution Metrics

Metric	Type	Labels	Description
`moltis_tool_executions_total`	Counter	tool	Tool executions
`moltis_tool_execution_duration_seconds`	Histogram	tool	Execution time
`moltis_sandbox_command_executions_total`	Counter	—	Sandbox commands run

Session Metrics

Metric	Type	Labels	Description
`moltis_sessions_created_total`	Counter	—	Sessions created
`moltis_sessions_active`	Gauge	—	Currently active sessions
`moltis_session_messages_total`	Counter	role	Messages by role

Cron Job Metrics

Metric	Type	Labels	Description
`moltis_cron_jobs_scheduled`	Gauge	—	Number of scheduled jobs
`moltis_cron_executions_total`	Counter	—	Job executions
`moltis_cron_execution_duration_seconds`	Histogram	—	Job duration
`moltis_cron_errors_total`	Counter	—	Failed jobs
`moltis_cron_stuck_jobs_cleared_total`	Counter	—	Jobs exceeding 2h timeout
`moltis_cron_input_tokens_total`	Counter	—	Input tokens from cron runs
`moltis_cron_output_tokens_total`	Counter	—	Output tokens from cron runs

Memory/Search Metrics

Metric	Type	Labels	Description
`moltis_memory_searches_total`	Counter	search_type	Searches performed
`moltis_memory_search_duration_seconds`	Histogram	search_type	Search latency
`moltis_memory_embeddings_generated_total`	Counter	provider	Embeddings created

Channel Metrics

Metric	Type	Labels	Description
`moltis_channels_active`	Gauge	—	Loaded channel plugins
`moltis_channel_messages_received_total`	Counter	channel	Inbound messages
`moltis_channel_messages_sent_total`	Counter	channel	Outbound messages

Telegram-Specific Metrics

Metric	Type	Labels	Description
`moltis_telegram_messages_received_total`	Counter	—	Messages from Telegram
`moltis_telegram_access_control_denials_total`	Counter	—	Access denied events
`moltis_telegram_polling_duration_seconds`	Histogram	—	Message handling time

OAuth Metrics

Metric	Type	Labels	Description
`moltis_oauth_flow_starts_total`	Counter	—	OAuth flows initiated
`moltis_oauth_flow_completions_total`	Counter	—	Successful completions
`moltis_oauth_token_refresh_total`	Counter	—	Token refreshes
`moltis_oauth_token_refresh_failures_total`	Counter	—	Refresh failures

Skills Metrics

Metric	Type	Labels	Description
`moltis_skills_installation_attempts_total`	Counter	—	Installation attempts
`moltis_skills_installation_duration_seconds`	Histogram	—	Installation time
`moltis_skills_git_clone_total`	Counter	—	Successful git clones
`moltis_skills_git_clone_fallback_total`	Counter	—	Fallbacks to HTTP tarball

Tracing Integration

The moltis-metrics crate includes optional tracing integration via the tracing feature. This allows span context to propagate to metric labels.

Enabling Tracing

moltis-metrics = { version = "0.1", features = ["prometheus", "tracing"] }

Initialization

use moltis_metrics::tracing_integration::init_tracing;

fn main() {
    // Initialize tracing with metrics context propagation
    init_tracing();

    // Now spans will add labels to metrics
}

How It Works

When tracing is enabled, span fields are automatically added as metric labels:

#![allow(unused)]
fn main() {
use tracing::instrument;

#[instrument(fields(operation = "fetch_user", component = "api"))]
async fn fetch_user(id: u64) -> User {
    // Metrics recorded here will include:
    // - operation="fetch_user"
    // - component="api"
    counter!("api_calls_total").increment(1);
}
}

Span Labels

The following span fields are propagated to metrics:

Field	Description
`operation`	The operation being performed
`component`	The component/module name
`span.name`	The span’s target/name

Adding Custom Metrics

In Your Code

Use the metrics macros re-exported from moltis-metrics:

#![allow(unused)]
fn main() {
use moltis_metrics::{counter, gauge, histogram, labels};

// Simple counter
counter!("my_custom_requests_total").increment(1);

// Counter with labels
counter!(
    "my_custom_requests_total",
    labels::ENDPOINT => "/api/users",
    labels::METHOD => "GET"
).increment(1);

// Gauge (current value)
gauge!("my_queue_size").set(42.0);

// Histogram (distribution)
histogram!("my_operation_duration_seconds").record(0.123);
}

Feature-Gating

Always gate metrics code to avoid overhead when disabled:

#![allow(unused)]
fn main() {
#[cfg(feature = "metrics")]
use moltis_metrics::{counter, histogram};

pub async fn my_function() {
    #[cfg(feature = "metrics")]
    let start = std::time::Instant::now();

    // ... do work ...

    #[cfg(feature = "metrics")]
    {
        counter!("my_operations_total").increment(1);
        histogram!("my_operation_duration_seconds")
            .record(start.elapsed().as_secs_f64());
    }
}
}

Adding New Metric Definitions

For consistency, add metric name constants to crates/metrics/src/definitions.rs:

#![allow(unused)]
fn main() {
/// My feature metrics
pub mod my_feature {
    /// Total operations performed
    pub const OPERATIONS_TOTAL: &str = "moltis_my_feature_operations_total";
    /// Operation duration in seconds
    pub const OPERATION_DURATION_SECONDS: &str = "moltis_my_feature_operation_duration_seconds";
}
}

Then use them:

#![allow(unused)]
fn main() {
use moltis_metrics::{counter, my_feature};

counter!(my_feature::OPERATIONS_TOTAL).increment(1);
}

Configuration

Metrics configuration in moltis.toml:

[metrics]
enabled = true              # Enable metrics collection (default: true)
prometheus_endpoint = true  # Expose /metrics endpoint (default: true)
labels = { env = "prod" }   # Add custom labels to all metrics

Environment variables:

RUST_LOG=moltis_metrics=debug — Enable debug logging for metrics initialization

Best Practices

Use consistent naming: Follow the pattern moltis_<subsystem>_<metric>_<unit>
Add units to names: _total for counters, _seconds for durations, _bytes for sizes
Keep cardinality low: Avoid high-cardinality labels (like user IDs or request IDs)
Feature-gate everything: Use #[cfg(feature = "metrics")] to ensure zero overhead when disabled
Use predefined buckets: The buckets module has standard histogram buckets for common metric types

Troubleshooting

Metrics not appearing

Verify the metrics feature is enabled at compile time
Check that the metrics recorder is initialized (happens automatically in gateway)
Ensure you’re hitting the correct /metrics endpoint
Check moltis.toml has [metrics] enabled = true

Prometheus endpoint not available

Ensure the prometheus feature is enabled (it’s separate from metrics)
Check your build: cargo build --features prometheus

High memory usage

Check for high-cardinality labels (many unique label combinations)
Consider reducing histogram bucket counts

Missing labels

Ensure labels are passed consistently across all metric recordings
Check that tracing spans include the expected fields

Tool Registry

The tool registry manages all tools available to the agent during a conversation. It tracks where each tool comes from and supports filtering by source.

Tool Sources

Every registered tool has a ToolSource that identifies its origin:

Builtin — tools shipped with the binary (exec, web_fetch, etc.)
Mcp { server } — tools provided by an MCP server, tagged with the server name

This replaces the previous convention of identifying MCP tools by their mcp__ name prefix, providing type-safe filtering instead of string matching.

Registration

#![allow(unused)]
fn main() {
// Built-in tool
registry.register(Box::new(MyTool::new()));

// MCP tool — tagged with server name
registry.register_mcp(Box::new(adapter), "github".to_string());
}

Filtering

When MCP tools are disabled for a session, the registry can produce a filtered copy:

#![allow(unused)]
fn main() {
// Type-safe: filters by ToolSource::Mcp variant
let no_mcp = registry.clone_without_mcp();

// Remove all MCP tools in-place (used during sync)
let removed_count = registry.unregister_mcp();
}

Schema Output

list_schemas() includes source metadata in every tool schema:

{
  "name": "exec",
  "description": "Execute a command",
  "parameters": { ... },
  "source": "builtin"
}

{
  "name": "mcp__github__search",
  "description": "Search GitHub",
  "parameters": { ... },
  "source": "mcp",
  "mcpServer": "github"
}

The source and mcpServer fields are available to the UI for rendering tools grouped by origin.

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Config Check Command: moltis config check validates the configuration file, detects unknown/misspelled fields with Levenshtein-based suggestions, warns about security misconfigurations, and checks file references
Memory Usage Indicator: Display process RSS and system free memory in the header bar, updated every 30 seconds via the tick WebSocket broadcast
QMD Backend Support: Optional QMD (Query Memory Daemon) backend for hybrid search with BM25 + vector + LLM reranking
- Gated behind qmd feature flag (enabled by default)
- Web UI shows installation instructions and QMD status
- Comparison table between built-in SQLite and QMD backends
Citations: Configurable citation mode (on/off/auto) for memory search results
- Auto mode includes citations when results span multiple files
Session Export: Option to export session transcripts to memory for future reference
LLM Reranking: Use LLM to rerank search results for improved relevance (requires QMD)
Memory Documentation: Added docs/src/memory.md with comprehensive memory system documentation
Mobile PWA Support: Install moltis as a Progressive Web App on iOS, Android, and desktop
- Standalone mode with full-screen experience
- Custom app icon (crab mascot)
- Service worker for offline support and caching
- Safe area support for notched devices
Push Notifications: Receive alerts when the LLM responds
- VAPID key generation and storage for Web Push API
- Subscribe/unsubscribe toggle in Settings > Notifications
- Subscription management UI showing device name, IP address, and date
- Remove any subscription from any device
- Real-time subscription updates via WebSocket
- Client IP detection from X-Forwarded-For, X-Real-IP, CF-Connecting-IP headers
- Notifications sent for both streaming and agent (tool-using) chat modes
Safari/iOS PWA Detection: Show “Add to Dock” instructions when push notifications require PWA installation (Safari doesn’t support push in browser mode)
Session state store: per-session key-value persistence scoped by namespace, backed by SQLite (session_state tool).
Session branching: branch_session tool forks a conversation at any message index into an independent copy.
Session fork from UI: Fork button in the chat header and sidebar action buttons let users fork sessions without asking the LLM. Forked sessions appear indented under their parent with a branch icon.
Skill self-extension: create_skill, update_skill, delete_skill tools let the agent manage project-local skills at runtime.
Skill hot-reload: filesystem watcher on skill directories emits skills.changed events via WebSocket when SKILL.md files change.
Typed tool sources: ToolSource enum (Builtin / Mcp { server }) replaces string-prefix identification of MCP tools in the tool registry.
Tool registry metadata: list_schemas() now includes source and mcpServer fields so the UI can group tools by origin.
Per-session MCP toggle: sessions store an mcp_disabled flag; the chat header exposes a toggle button to enable/disable MCP tools per session.
Debug panel convergence: the debug side-panel now renders the same seven sections as the /context slash command, eliminating duplicated rendering logic.
Documentation pages for session state, session branching, skill self-extension, and the tool registry architecture.

Changed

Memory settings UI enhanced with backend comparison and feature explanations
Added memory.qmd.status RPC method for checking QMD availability
Extended memory.config.get to include qmd_feature_enabled flag
Push notifications feature is now enabled by default in the CLI
TLS HTTP redirect port now defaults to gateway_port + 1 instead of the hardcoded port 18790. This makes the Dockerfile simpler (both ports are adjacent) and avoids collisions when running multiple instances. Override via [tls] http_redirect_port in moltis.toml or the MOLTIS_TLS__HTTP_REDIRECT_PORT environment variable.
TLS certificates use moltis.localhost domain. Auto-generated server certs now include moltis.localhost, *.moltis.localhost, localhost, 127.0.0.1, and ::1 as SANs. Banner and redirect URLs use https://moltis.localhost:<port> when bound to loopback, so the cert matches the displayed URL. Existing certs are automatically regenerated on next startup.
Certificate validity uses dynamic dates. Cert notBefore/notAfter are now computed from the current system time instead of being hardcoded. CA certs are valid for 10 years, server certs for 1 year from generation.
McpToolBridge now stores and exposes server_name() for typed registration.
mcp_service::sync_mcp_tools() uses unregister_mcp() / register_mcp() instead of scanning tool names by prefix.
chat.rs uses clone_without_mcp() instead of clone_without_prefix("mcp__") in all three call sites.

Fixed

Push notifications not sending when chat uses agent mode (run_with_tools)
Missing space in Safari install instructions (“usingFile” → “using File”)
WebSocket origin validation now treats .localhost subdomains (e.g. moltis.localhost) as loopback equivalents per RFC 6761.
Fork/branch icon in session sidebar now renders cleanly at 16px (replaced complex git-branch SVG with simple trunk+branch path).
Deleting a forked session now navigates to the parent session instead of an unrelated sibling.
Streaming tool calls for non-Anthropic providers: OpenAiProvider, GitHubCopilotProvider, KimiCodeProvider, OpenAiCodexProvider, and ProviderChain now implement stream_with_tools() so tool schemas are sent in the streaming API request and tool-call events are properly parsed. Previously only AnthropicProvider supported streaming tool calls; all other providers silently dropped the tools parameter, causing the LLM to emit tool invocations as plain text instead of structured function calls.
Streaming tool call arguments dropped when index ≠ 0: When a provider (e.g. GitHub Copilot proxying Claude) emits a text content block at streaming index 0 and a tool_use block at index 1, the runner’s argument finalization used the streaming index as the vector position directly. Since tool_calls has only 1 element at position 0, the condition 1 < 1 was false and arguments were silently dropped (empty {}). Fixed by mapping streaming indices to vector positions via a HashMap.
Skill tools wrote to wrong directory: create_skill, update_skill, and delete_skill used std::env::current_dir() captured at gateway startup, writing skills to <cwd>/.moltis/skills/ instead of ~/.moltis/skills/. Skills now write to <data_dir>/skills/ (Personal source), which is always discovered regardless of where the gateway was started.
Skills page missing personal/project skills: The /api/skills endpoint only returned manifest-based registry skills. Personal and project-local skills were never shown in the navigation or skills page. The endpoint now discovers and includes them alongside registry skills.

Documentation

Added mobile-pwa.md with PWA installation and push notification documentation
Updated CLAUDE.md with cargo feature policy (features enabled by default)
Rewrote session-branching.md with accurate fork details, UI methods, RPC API, inheritance table, and deletion behavior.

Keyboard shortcuts

Moltis Documentation