Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Moltis

A personal AI gateway written in Rust.
One binary, no runtime, no npm.

Moltis compiles your entire AI gateway — web UI, LLM providers, tools, and all assets — into a single self-contained executable. There’s no Node.js to babysit, no node_modules to sync, no V8 garbage collector introducing latency spikes.

# Quick install (macOS / Linux)
curl -fsSL https://www.moltis.org/install.sh | sh

Why Moltis?

FeatureMoltisOther Solutions
DeploymentSingle binaryNode.js + dependencies
Memory SafetyRust ownershipGarbage collection
Secret HandlingZeroed on drop“Eventually collected”
SandboxDocker + Apple ContainerDocker only
StartupMillisecondsSeconds

Key Features

  • 30+ LLM Providers — Anthropic, OpenAI, Google, Mistral, local models, and more
  • Streaming-First — Responses appear as tokens arrive, not after completion
  • Sandboxed Execution — Commands run in isolated containers (Docker or Apple Container)
  • MCP Support — Connect to Model Context Protocol servers for extended capabilities
  • Multi-Channel — Web UI, Telegram, API access with synchronized responses
  • Long-Term Memory — Embeddings-powered knowledge base with hybrid search
  • Hook System — Observe, modify, or block actions at any lifecycle point
  • Compile-Time Safety — Misconfigurations caught by cargo check, not runtime crashes

Quick Start

# Install
curl -fsSL https://www.moltis.org/install.sh | sh

# Run
moltis

On first launch:

  1. Open the URL shown in your browser (e.g., http://localhost:13131)
  2. Add your LLM API key
  3. Start chatting!

Note

Authentication is only required when accessing Moltis from a non-localhost address. On localhost, you can start using it immediately.

Full Quickstart Guide

How It Works

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Web UI    │  │  Telegram   │  │     API     │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │
       └────────────────┴────────────────┘
                        │
                        ▼
        ┌───────────────────────────────┐
        │       Moltis Gateway          │
        │   ┌─────────┐ ┌───────────┐   │
        │   │  Agent  │ │   Tools   │   │
        │   │  Loop   │◄┤  Registry │   │
        │   └────┬────┘ └───────────┘   │
        │        │                      │
        │   ┌────▼────────────────┐     │
        │   │  Provider Registry  │     │
        │   │ Claude · GPT · Gemini │   │
        │   └─────────────────────┘     │
        └───────────────────────────────┘
                        │
                ┌───────▼───────┐
                │    Sandbox    │
                │ Docker/Apple  │
                └───────────────┘

Documentation

Getting Started

Features

  • Providers — Configure LLM providers
  • MCP Servers — Extend with Model Context Protocol
  • Hooks — Lifecycle hooks for customization
  • Local LLMs — Run models on your machine

Deployment

  • Docker — Container deployment

Architecture

Security

Moltis applies defense in depth:

  • Authentication — Password or passkey (WebAuthn) required for non-localhost access
  • SSRF Protection — Blocks requests to internal networks
  • Secret Handlingsecrecy::Secret zeroes memory on drop
  • Sandboxed Execution — Commands never run on the host
  • Origin Validation — Prevents Cross-Site WebSocket Hijacking
  • No Unsafe Codeunsafe is denied workspace-wide

Community

License

MIT — Free for personal and commercial use.

Quickstart

Get Moltis running in under 5 minutes.

1. Install

curl -fsSL https://www.moltis.org/install.sh | sh

Or via Homebrew:

brew install moltis-org/tap/moltis

2. Start

moltis

You’ll see output like:

🚀 Moltis gateway starting...
🌐 Open http://localhost:13131 in your browser

3. Configure a Provider

You need an LLM API key to chat. The easiest options:

  1. Get an API key from console.anthropic.com
  2. In Moltis, go to SettingsProviders
  3. Click Anthropic → Enter your API key → Save

Option B: OpenAI

  1. Get an API key from platform.openai.com
  2. In Moltis, go to SettingsProviders
  3. Click OpenAI → Enter your API key → Save

Option C: Local Model (Free)

  1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
  2. Pull a model: ollama pull llama3.2
  3. In Moltis, configure Ollama in SettingsProviders

4. Chat!

Go to the Chat tab and start a conversation:

You: Write a Python function to check if a number is prime

Agent: Here's a Python function to check if a number is prime:

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True

What’s Next?

Enable Tool Use

Moltis can execute code, browse the web, and more. Tools are enabled by default with sandbox protection.

Try:

You: Create a hello.py file that prints "Hello, World!" and run it

Connect Telegram

Chat with your agent from anywhere:

  1. Create a bot via @BotFather
  2. Copy the bot token
  3. In Moltis: SettingsTelegram → Enter token → Save
  4. Message your bot!

Add MCP Servers

Extend capabilities with MCP servers:

# In moltis.toml
[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }

Set Up Memory

Enable long-term memory for context across sessions:

# In moltis.toml
[memory]
enabled = true

Add knowledge by placing Markdown files in ~/.moltis/memory/.

Useful Commands

CommandDescription
/newStart a new session
/model <name>Switch models
/clearClear chat history
/helpShow available commands

File Locations

PathContents
~/.config/moltis/moltis.tomlConfiguration
~/.config/moltis/provider_keys.jsonAPI keys
~/.moltis/Data (sessions, memory, logs)

Getting Help

Installation

Moltis is distributed as a single self-contained binary. Choose the installation method that works best for your setup.

The fastest way to get started on macOS or Linux:

curl -fsSL https://www.moltis.org/install.sh | sh

This downloads the latest release for your platform and installs it to ~/.local/bin.

Package Managers

Homebrew (macOS / Linux)

brew install moltis-org/tap/moltis

Cargo Binstall (Pre-built Binary)

If you have cargo-binstall installed:

cargo binstall moltis

This downloads a pre-built binary without compiling from source.

Linux Packages

Debian / Ubuntu (.deb)

# Download the latest .deb package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis_amd64.deb

# Install
sudo dpkg -i moltis_amd64.deb

Fedora / RHEL (.rpm)

# Download the latest .rpm package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.x86_64.rpm

# Install
sudo rpm -i moltis.x86_64.rpm

Arch Linux (.pkg.tar.zst)

# Download the latest package
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.pkg.tar.zst

# Install
sudo pacman -U moltis.pkg.tar.zst

Snap

sudo snap install moltis

AppImage

# Download
curl -LO https://github.com/moltis-org/moltis/releases/latest/download/moltis.AppImage
chmod +x moltis.AppImage

# Run
./moltis.AppImage

Docker

Multi-architecture images (amd64/arm64) are published to GitHub Container Registry:

docker pull ghcr.io/moltis-org/moltis:latest

See Docker Deployment for full instructions on running Moltis in a container.

Build from Source

Prerequisites

  • Rust 1.75 or later
  • A C compiler (for some dependencies)

Clone and Build

git clone https://github.com/moltis-org/moltis.git
cd moltis
cargo build --release

The binary will be at target/release/moltis.

Install via Cargo

cargo install moltis --git https://github.com/moltis-org/moltis

First Run

After installation, start Moltis:

moltis

On first launch:

  1. Open http://localhost:<port> in your browser (the port is shown in the terminal output)
  2. Configure your LLM provider (API key)
  3. Start chatting!

Tip

Moltis picks a random available port on first install to avoid conflicts. The port is saved in your config and reused on subsequent runs.

Note

Authentication is only required when accessing Moltis from a non-localhost address (e.g., over the network). When this happens, a one-time setup code is printed to the terminal for initial authentication setup.

Verify Installation

moltis --version

Updating

Homebrew

brew upgrade moltis

Cargo Binstall

cargo binstall moltis

From Source

cd moltis
git pull
cargo build --release

Uninstalling

Homebrew

brew uninstall moltis

Remove Data

Moltis stores data in two directories:

# Configuration
rm -rf ~/.config/moltis

# Data (sessions, databases, memory)
rm -rf ~/.moltis

Warning

Removing these directories deletes all your conversations, memory, and settings permanently.

Configuration

Moltis is configured through moltis.toml, located in ~/.config/moltis/ by default.

On first run, a complete configuration file is generated with sensible defaults. You can edit it to customize behavior.

Configuration File Location

PlatformDefault Path
macOS/Linux~/.config/moltis/moltis.toml
CustomSet via --config-dir or MOLTIS_CONFIG_DIR

Basic Settings

[gateway]
port = 13131                    # HTTP/WebSocket port
host = "0.0.0.0"               # Listen address

[agent]
name = "Moltis"                 # Agent display name
model = "claude-sonnet-4-20250514"  # Default model
timeout = 600                   # Agent run timeout (seconds)
max_iterations = 25             # Max tool call iterations per run

LLM Providers

Provider API keys are stored separately in ~/.config/moltis/provider_keys.json for security. Configure them through the web UI or directly in the JSON file.

[providers]
default = "anthropic"           # Default provider

[providers.anthropic]
enabled = true
models = [
    "claude-sonnet-4-20250514",
    "claude-opus-4-20250514",
    "claude-3-5-haiku-20241022",
]

[providers.openai]
enabled = true
models = [
    "gpt-4o",
    "gpt-4o-mini",
    "o1-preview",
]

See Providers for detailed provider configuration.

Sandbox Configuration

Commands run inside isolated containers for security:

[tools.exec.sandbox]
enabled = true
backend = "docker"              # "docker" or "apple" (macOS 15+)
base_image = "ubuntu:25.10"

# Packages installed in the sandbox image
packages = [
    "curl",
    "git",
    "jq",
    "python3",
    "python3-pip",
    "nodejs",
    "npm",
]

Info

When you modify the packages list and restart, Moltis automatically rebuilds the sandbox image with a new tag.

Memory System

Long-term memory uses embeddings for semantic search:

[memory]
enabled = true
embedding_model = "text-embedding-3-small"  # OpenAI embedding model
chunk_size = 512                # Characters per chunk
chunk_overlap = 50              # Overlap between chunks

# Directories to watch for memory files
watch_dirs = [
    "~/.moltis/memory",
]

Authentication

Authentication is only required when accessing Moltis from a non-localhost address. When running on localhost or 127.0.0.1, no authentication is needed by default.

When you access Moltis from a network address (e.g., http://192.168.1.100:13131), a one-time setup code is printed to the terminal. Use it to set up a password or passkey.

[auth]
disabled = false                # Set true to disable auth entirely

# Session settings
session_expiry = 604800         # Session lifetime in seconds (7 days)

Warning

Only set disabled = true if Moltis is running on a trusted private network. Never expose an unauthenticated instance to the internet.

Hooks

Configure lifecycle hooks:

[[hooks]]
name = "my-hook"
command = "./hooks/my-hook.sh"
events = ["BeforeToolCall", "AfterToolCall"]
timeout = 5                     # Timeout in seconds

[hooks.env]
MY_VAR = "value"               # Environment variables for the hook

See Hooks for the full hook system documentation.

MCP Servers

Connect to Model Context Protocol servers:

[[mcp.servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed"]

[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }

Telegram Integration

[telegram]
enabled = true
# Token is stored in provider_keys.json, not here
allowed_users = [123456789]     # Telegram user IDs allowed to chat

TLS / HTTPS

[tls]
enabled = true
cert_path = "~/.config/moltis/cert.pem"
key_path = "~/.config/moltis/key.pem"
# If paths don't exist, a self-signed certificate is generated

# Port for the plain-HTTP redirect / CA-download server.
# Defaults to the gateway port + 1 when not set.
# http_redirect_port = 13132

Override via environment variable: MOLTIS_TLS__HTTP_REDIRECT_PORT=8080.

Tailscale Integration

Expose Moltis over your Tailscale network:

[tailscale]
enabled = true
mode = "serve"                  # "serve" (private) or "funnel" (public)

Observability

[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317"  # OpenTelemetry collector

Environment Variables

All settings can be overridden via environment variables:

VariableDescription
MOLTIS_CONFIG_DIRConfiguration directory
MOLTIS_DATA_DIRData directory
MOLTIS_PORTGateway port
MOLTIS_HOSTListen address

CLI Flags

moltis --config-dir /path/to/config --data-dir /path/to/data

Complete Example

[gateway]
port = 13131
host = "0.0.0.0"

[agent]
name = "Atlas"
model = "claude-sonnet-4-20250514"
timeout = 600
max_iterations = 25

[providers]
default = "anthropic"

[tools.exec.sandbox]
enabled = true
backend = "docker"
base_image = "ubuntu:25.10"
packages = ["curl", "git", "jq", "python3", "nodejs"]

[memory]
enabled = true

[auth]
disabled = false

[[hooks]]
name = "audit-log"
command = "./hooks/audit.sh"
events = ["BeforeToolCall"]
timeout = 5

LLM Providers

Moltis supports 30+ LLM providers through a trait-based architecture. Configure providers through the web UI or directly in configuration files.

Supported Providers

Tier 1 (Full Support)

ProviderModelsTool CallingStreaming
AnthropicClaude 4, Claude 3.5, Claude 3
OpenAIGPT-4o, GPT-4, o1, o3
GoogleGemini 2.0, Gemini 1.5
GitHub CopilotGPT-4o, Claude

Tier 2 (Good Support)

ProviderModelsTool CallingStreaming
MistralMistral Large, Codestral
GroqLlama 3, Mixtral
TogetherVarious open models
FireworksVarious open models
DeepSeekDeepSeek V3, Coder

Tier 3 (Basic Support)

ProviderNotes
OpenRouterAggregator for 100+ models
OllamaLocal models
VenicePrivacy-focused
CerebrasFast inference
SambaNovaEnterprise
CohereCommand models
AI21Jamba models

Configuration

  1. Open Moltis in your browser
  2. Go to SettingsProviders
  3. Click on a provider card
  4. Enter your API key
  5. Select your preferred model

Via Configuration Files

Provider credentials are stored in ~/.config/moltis/provider_keys.json:

{
  "anthropic": {
    "apiKey": "sk-ant-...",
    "model": "claude-sonnet-4-20250514"
  },
  "openai": {
    "apiKey": "sk-...",
    "model": "gpt-4o"
  }
}

Enable providers in moltis.toml:

[providers]
default = "anthropic"

[providers.anthropic]
enabled = true
models = [
    "claude-sonnet-4-20250514",
    "claude-opus-4-20250514",
]

[providers.openai]
enabled = true

Provider-Specific Setup

Anthropic

  1. Get an API key from console.anthropic.com
  2. Enter it in Settings → Providers → Anthropic

Tip

Claude Sonnet 4 offers the best balance of capability and cost for most coding tasks.

OpenAI

  1. Get an API key from platform.openai.com
  2. Enter it in Settings → Providers → OpenAI

GitHub Copilot

GitHub Copilot uses OAuth authentication:

  1. Click Connect in Settings → Providers → GitHub Copilot
  2. Complete the GitHub OAuth flow
  3. Authorize Moltis to access Copilot

Info

Requires an active GitHub Copilot subscription.

Google (Gemini)

  1. Get an API key from aistudio.google.com
  2. Enter it in Settings → Providers → Google

Ollama (Local Models)

Run models locally with Ollama:

  1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
  2. Pull a model: ollama pull llama3.2
  3. Configure in Moltis:
{
  "ollama": {
    "baseUrl": "http://localhost:11434",
    "model": "llama3.2"
  }
}

OpenRouter

Access 100+ models through one API:

  1. Get an API key from openrouter.ai
  2. Enter it in Settings → Providers → OpenRouter
  3. Specify the model ID you want to use
{
  "openrouter": {
    "apiKey": "sk-or-...",
    "model": "anthropic/claude-3.5-sonnet"
  }
}

Custom Base URLs

For providers with custom endpoints (enterprise, proxies):

{
  "openai": {
    "apiKey": "sk-...",
    "baseUrl": "https://your-proxy.example.com/v1",
    "model": "gpt-4o"
  }
}

Switching Providers

Per-Session

In the chat interface, use the model selector dropdown to switch providers/models for the current session.

Per-Message

Use the /model command to switch models mid-conversation:

/model claude-opus-4-20250514

Default Provider

Set the default in moltis.toml:

[providers]
default = "anthropic"

[agent]
model = "claude-sonnet-4-20250514"

Model Capabilities

Different models have different strengths:

Use CaseRecommended Model
General codingClaude Sonnet 4, GPT-4o
Complex reasoningClaude Opus 4, o1
Fast responsesClaude Haiku, GPT-4o-mini
Long contextClaude (200k), Gemini (1M+)
Local/privateLlama 3 via Ollama

Troubleshooting

“Model not available”

The model may not be enabled for your account or region. Check:

  • Your API key has access to the model
  • The model ID is spelled correctly
  • Your account has sufficient credits

“Rate limited”

You’ve exceeded the provider’s rate limits. Solutions:

  • Wait and retry
  • Use a different provider
  • Upgrade your API plan

“Invalid API key”

  • Verify the key is correct (no extra spaces)
  • Check the key hasn’t expired
  • Ensure the key has the required permissions

MCP Servers

Moltis supports the Model Context Protocol (MCP) for connecting to external tool servers. MCP servers extend your agent’s capabilities without modifying Moltis itself.

What is MCP?

MCP is an open protocol that lets AI assistants connect to external tools and data sources. Think of MCP servers as plugins that provide:

  • Tools — Functions the agent can call (e.g., search, file operations, API calls)
  • Resources — Data the agent can read (e.g., files, database records)
  • Prompts — Pre-defined prompt templates

Supported Transports

TransportDescriptionUse Case
stdioLocal process via stdin/stdoutnpm packages, local scripts
HTTP/SSERemote server via HTTPCloud services, shared servers

Adding an MCP Server

Via Web UI

  1. Go to SettingsMCP Servers
  2. Click Add Server
  3. Enter the server configuration
  4. Click Save

Via Configuration

Add servers to moltis.toml:

[[mcp.servers]]
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/projects"]

[[mcp.servers]]
name = "github"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
env = { GITHUB_TOKEN = "ghp_..." }

[[mcp.servers]]
name = "remote-api"
url = "https://mcp.example.com/sse"
transport = "sse"

Official Servers

ServerDescriptionInstall
filesystemRead/write local filesnpx @modelcontextprotocol/server-filesystem
githubGitHub API accessnpx @modelcontextprotocol/server-github
postgresPostgreSQL queriesnpx @modelcontextprotocol/server-postgres
sqliteSQLite databasenpx @modelcontextprotocol/server-sqlite
puppeteerBrowser automationnpx @modelcontextprotocol/server-puppeteer
brave-searchWeb searchnpx @modelcontextprotocol/server-brave-search

Community Servers

Explore more at mcp.so and GitHub MCP Servers.

Configuration Options

[[mcp.servers]]
name = "my-server"              # Display name
command = "node"                # Command to run
args = ["server.js"]            # Command arguments
cwd = "/path/to/server"         # Working directory

# Environment variables
env = { API_KEY = "secret", DEBUG = "true" }

# Health check settings
health_check_interval = 30      # Seconds between health checks
restart_on_failure = true       # Auto-restart on crash
max_restart_attempts = 5        # Give up after N restarts
restart_backoff = "exponential" # "linear" or "exponential"

Server Lifecycle

┌─────────────────────────────────────────────────────┐
│                   MCP Server                         │
│                                                      │
│  Start → Initialize → Ready → [Tool Calls] → Stop   │
│            │                       │                 │
│            ▼                       ▼                 │
│     Health Check ◄─────────── Heartbeat             │
│            │                       │                 │
│            ▼                       ▼                 │
│    Crash Detected ───────────► Restart              │
│                                    │                 │
│                              Backoff Wait            │
└─────────────────────────────────────────────────────┘

Health Monitoring

Moltis monitors MCP servers and automatically:

  • Detects crashes via process exit
  • Restarts with exponential backoff
  • Disables after max restart attempts
  • Re-enables after cooldown period

Using MCP Tools

Once connected, MCP tools appear alongside built-in tools. The agent can use them naturally:

User: Search GitHub for Rust async runtime projects

Agent: I'll search GitHub for you.
[Calling github.search_repositories with query="rust async runtime"]

Found 15 repositories:
1. tokio-rs/tokio - A runtime for writing reliable async applications
2. async-std/async-std - Async version of the Rust standard library
...

Creating an MCP Server

Simple Node.js Server

// server.js
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server(
  { name: "my-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler("tools/list", async () => ({
  tools: [{
    name: "hello",
    description: "Says hello",
    inputSchema: {
      type: "object",
      properties: {
        name: { type: "string", description: "Name to greet" }
      },
      required: ["name"]
    }
  }]
}));

server.setRequestHandler("tools/call", async (request) => {
  if (request.params.name === "hello") {
    const name = request.params.arguments.name;
    return { content: [{ type: "text", text: `Hello, ${name}!` }] };
  }
});

const transport = new StdioServerTransport();
await server.connect(transport);

Configure in Moltis

[[mcp.servers]]
name = "my-server"
command = "node"
args = ["server.js"]
cwd = "/path/to/my-server"

Debugging

Check Server Status

In the web UI, go to SettingsMCP Servers to see:

  • Connection status (connected/disconnected/error)
  • Available tools
  • Recent errors

View Logs

MCP server stderr is captured in Moltis logs:

# View gateway logs
tail -f ~/.moltis/logs/gateway.log | grep mcp

Test Locally

Run the server directly to debug:

echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | node server.js

Security Considerations

Warning

MCP servers run with the same permissions as Moltis. Only use servers from trusted sources.

  • Review server code before running
  • Limit file access — use specific paths, not /
  • Use environment variables for secrets
  • Network isolation — run untrusted servers in containers

Troubleshooting

Server won’t start

  • Check the command exists: which npx
  • Verify the package: npx @modelcontextprotocol/server-filesystem --help
  • Check for port conflicts

Tools not appearing

  • Server may still be initializing (wait a few seconds)
  • Check server logs for errors
  • Verify the server implements tools/list

Server keeps restarting

  • Check stderr for crash messages
  • Increase max_restart_attempts for debugging
  • Verify environment variables are set correctly

Memory System

Moltis provides a powerful memory system that enables the agent to recall past conversations, notes, and context across sessions. This document explains the available backends, features, and configuration options.

Backends

Moltis supports two memory backends:

FeatureBuilt-inQMD
Search TypeHybrid (vector + FTS5 keyword)Hybrid (BM25 + vector + LLM reranking)
Local EmbeddingsGGUF models via llama-cpp-2GGUF models
Remote EmbeddingsOpenAI, Ollama, custom endpointsBuilt-in
Embedding CacheSQLite with LRU evictionBuilt-in
Batch APIOpenAI batch (50% cost saving)No
Circuit BreakerFallback chain with auto-recoveryNo
LLM RerankingOptional (configurable)Built-in with query command
File WatchingReal-time sync via notifyBuilt-in
External DependencyNone (pure Rust)Requires QMD binary (Node.js/Bun)
Offline SupportYes (with local embeddings)Yes

Built-in Backend

The default backend uses SQLite for storage with FTS5 for keyword search and optional vector embeddings for semantic search. Key advantages:

  • Zero external dependencies: Everything is embedded in the moltis binary
  • Fallback chain: Automatically switches between embedding providers if one fails
  • Batch embedding: Reduces OpenAI API costs by 50% for large sync operations
  • Embedding cache: Avoids re-embedding unchanged content

QMD Backend

QMD is an optional external sidecar that provides enhanced search capabilities:

  • BM25 keyword search: Fast, instant results (similar to Elasticsearch)
  • Vector search: Semantic similarity using local GGUF models
  • Hybrid search with LLM reranking: Combines both methods with an LLM pass for optimal relevance

To use QMD:

  1. Install QMD separately from github.com/qmd/qmd
  2. Enable it in Settings > Memory > Backend

Features

Citations

Citations append source file and line number information to search results:

Some important content from your notes.

Source: memory/notes.md#42

Configuration options:

  • auto (default): Include citations when results come from multiple files
  • on: Always include citations
  • off: Never include citations

Session Export

When enabled, session transcripts are automatically exported to the memory system for cross-run recall. This allows the agent to remember past conversations even after restarts.

Exported sessions are:

  • Stored in memory/sessions/ as markdown files
  • Sanitized to remove sensitive tool results and system messages
  • Automatically cleaned up based on age/count limits

LLM Reranking

LLM reranking uses the configured language model to re-score and reorder search results based on semantic relevance to the query. This provides better results than keyword or vector matching alone, at the cost of additional latency.

How it works:

  1. Initial search returns candidate results
  2. LLM evaluates each result’s relevance (0.0-1.0 score)
  3. Results are reordered by combined score (70% LLM, 30% original)

Configuration

Memory settings can be configured in moltis.toml:

[memory]
# Backend: "builtin" (default) or "qmd"
backend = "builtin"

# Embedding provider: "local", "ollama", "openai", "custom", or auto-detect
provider = "local"

# Citation mode: "on", "off", or "auto"
citations = "auto"

# Enable LLM reranking for hybrid search
llm_reranking = false

# Export sessions to memory for cross-run recall
session_export = true

# QMD-specific settings (only used when backend = "qmd")
[memory.qmd]
command = "qmd"
max_results = 10
timeout_ms = 30000

Or via the web UI: Settings > Memory

Embedding Providers

The built-in backend supports multiple embedding providers:

ProviderModelDimensionsNotes
Local (GGUF)EmbeddingGemma-300M768Offline, ~300MB download
Ollamanomic-embed-text768Requires Ollama running
OpenAItext-embedding-3-small1536Requires API key
CustomConfigurableVariesOpenAI-compatible endpoint

The system auto-detects available providers and creates a fallback chain:

  1. Try configured provider first
  2. Fall back to other available providers if it fails
  3. Use keyword-only search if no embedding provider is available

Memory Directories

By default, moltis indexes markdown files from:

  • ~/.moltis/MEMORY.md - Main long-term memory file
  • ~/.moltis/memory/*.md - Additional memory files
  • ~/.moltis/memory/sessions/*.md - Exported session transcripts

Tools

The memory system exposes two agent tools:

Search memory with a natural language query.

{
  "query": "what did we discuss about the API design?",
  "limit": 5
}

memory_get

Retrieve a specific chunk by ID.

{
  "chunk_id": "memory/notes.md:42"
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Memory Manager                          │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Chunker   │  │   Search    │  │  Session Export     │  │
│  │ (markdown)  │  │  (hybrid)   │  │  (transcripts)      │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│                    Storage Backend                          │
│  ┌────────────────────────┐  ┌────────────────────────┐    │
│  │   Built-in (SQLite)    │  │   QMD (sidecar)        │    │
│  │  - FTS5 keyword        │  │  - BM25 keyword        │    │
│  │  - Vector similarity   │  │  - Vector similarity   │    │
│  │  - Embedding cache     │  │  - LLM reranking       │    │
│  └────────────────────────┘  └────────────────────────┘    │
├─────────────────────────────────────────────────────────────┤
│                  Embedding Providers                        │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌───────────────┐  │
│  │  Local  │  │ Ollama  │  │ OpenAI  │  │ Batch/Fallback│  │
│  │  (GGUF) │  │         │  │         │  │               │  │
│  └─────────┘  └─────────┘  └─────────┘  └───────────────┘  │
└─────────────────────────────────────────────────────────────┘

Troubleshooting

Memory not working

  1. Check status in Settings > Memory
  2. Ensure at least one embedding provider is available:
    • Local: Requires local-embeddings feature enabled at build
    • Ollama: Must be running at localhost:11434
    • OpenAI: Requires OPENAI_API_KEY environment variable

Search returns no results

  1. Check that memory files exist in the expected directories
  2. Trigger a manual sync by restarting moltis
  3. Check logs for sync errors

QMD not available

  1. Verify QMD is installed: qmd --version
  2. Check that the path is correct in settings
  3. Ensure QMD has indexed your collections: qmd stats

Hooks

Hooks let you observe, modify, or block actions at key points in the agent lifecycle. Use them for auditing, policy enforcement, notifications, and custom integrations.

How Hooks Work

┌─────────────────────────────────────────────────────────┐
│                      Agent Loop                         │
│                                                         │
│  User Message → BeforeToolCall → Tool Execution         │
│                       │                 │               │
│                       ▼                 ▼               │
│                 [Your Hook]      AfterToolCall          │
│                       │                 │               │
│                 modify/block      [Your Hook]           │
│                       │                 │               │
│                       ▼                 ▼               │
│                   Continue → Response → MessageSent     │
└─────────────────────────────────────────────────────────┘

Event Types

Modifying Events (Sequential)

These events run hooks sequentially. Hooks can modify the payload or block the action.

EventDescriptionCan ModifyCan Block
BeforeToolCallBefore a tool executes
BeforeCompactionBefore context compaction
MessageSendingBefore sending a response
BeforeAgentStartBefore agent loop starts

Read-Only Events (Parallel)

These events run hooks in parallel for performance. They cannot modify or block.

EventDescription
AfterToolCallAfter a tool completes
AfterCompactionAfter context is compacted
MessageReceivedWhen a user message arrives
MessageSentAfter response is delivered
AgentEndWhen agent loop completes
SessionStartWhen a new session begins
SessionEndWhen a session ends
ToolResultPersistWhen tool result is saved
GatewayStartWhen Moltis starts
GatewayStopWhen Moltis shuts down
CommandWhen a slash command is used

Creating a Hook

1. Create the Hook Directory

mkdir -p ~/.moltis/hooks/my-hook

2. Create HOOK.md

+++
name = "my-hook"
description = "Logs all tool calls to a file"
events = ["BeforeToolCall", "AfterToolCall"]
command = "./handler.sh"
timeout = 5

[requires]
os = ["darwin", "linux"]
bins = ["jq"]
env = ["LOG_FILE"]
+++

# My Hook

This hook logs all tool calls for auditing purposes.

3. Create the Handler Script

#!/bin/bash
# handler.sh

# Read event payload from stdin
payload=$(cat)

# Extract event type
event=$(echo "$payload" | jq -r '.event')

# Log to file
echo "$(date -Iseconds) $event: $payload" >> "$LOG_FILE"

# Exit 0 to continue (don't block)
exit 0

4. Make it Executable

chmod +x ~/.moltis/hooks/my-hook/handler.sh

Shell Hook Protocol

Hooks communicate via stdin/stdout and exit codes:

Input

The event payload is passed as JSON on stdin:

{
  "event": "BeforeToolCall",
  "data": {
    "tool": "bash",
    "arguments": {
      "command": "ls -la"
    }
  },
  "session_id": "abc123",
  "timestamp": "2024-01-15T10:30:00Z"
}

Output

Exit CodeStdoutResult
0(empty)Continue normally
0{"action":"modify","data":{...}}Replace payload data
1Block (stderr = reason)

Example: Modify Tool Arguments

#!/bin/bash
payload=$(cat)
tool=$(echo "$payload" | jq -r '.data.tool')

if [ "$tool" = "bash" ]; then
    # Add safety flag to all bash commands
    modified=$(echo "$payload" | jq '.data.arguments.command = "set -e; " + .data.arguments.command')
    echo "{\"action\":\"modify\",\"data\":$(echo "$modified" | jq '.data')}"
fi

exit 0

Example: Block Dangerous Commands

#!/bin/bash
payload=$(cat)
command=$(echo "$payload" | jq -r '.data.arguments.command // ""')

# Block rm -rf /
if echo "$command" | grep -qE 'rm\s+-rf\s+/'; then
    echo "Blocked dangerous rm command" >&2
    exit 1
fi

exit 0

Hook Discovery

Hooks are discovered from HOOK.md files in these locations (priority order):

  1. Project-local: <workspace>/.moltis/hooks/<name>/HOOK.md
  2. User-global: ~/.moltis/hooks/<name>/HOOK.md

Project-local hooks take precedence over global hooks with the same name.

Configuration in moltis.toml

You can also define hooks directly in the config file:

[[hooks]]
name = "audit-log"
command = "./hooks/audit.sh"
events = ["BeforeToolCall", "AfterToolCall"]
timeout = 5
priority = 100  # Higher = runs first

[[hooks]]
name = "notify-slack"
command = "./hooks/slack-notify.sh"
events = ["SessionEnd"]
env = { SLACK_WEBHOOK_URL = "https://hooks.slack.com/..." }

Eligibility Requirements

Hooks can declare requirements that must be met:

[requires]
os = ["darwin", "linux"]       # Only run on these OSes
bins = ["jq", "curl"]          # Required binaries in PATH
env = ["SLACK_WEBHOOK_URL"]    # Required environment variables

If requirements aren’t met, the hook is skipped (not an error).

Circuit Breaker

Hooks that fail repeatedly are automatically disabled:

  • Threshold: 5 consecutive failures
  • Cooldown: 60 seconds
  • Recovery: Auto-re-enabled after cooldown

This prevents a broken hook from blocking all operations.

CLI Commands

# List all discovered hooks
moltis hooks list

# List only eligible hooks (requirements met)
moltis hooks list --eligible

# Output as JSON
moltis hooks list --json

# Show details for a specific hook
moltis hooks info my-hook

Bundled Hooks

Moltis includes several built-in hooks:

boot-md

Reads BOOT.md from the workspace on GatewayStart and injects it into the agent context.

session-memory

Saves session context when you use the /new command, preserving important information for future sessions.

command-logger

Logs all Command events to a JSONL file for auditing.

Example Hooks

Slack Notification on Session End

#!/bin/bash
# slack-notify.sh
payload=$(cat)
session_id=$(echo "$payload" | jq -r '.session_id')
message_count=$(echo "$payload" | jq -r '.data.message_count')

curl -X POST "$SLACK_WEBHOOK_URL" \
  -H 'Content-Type: application/json' \
  -d "{\"text\":\"Session $session_id ended with $message_count messages\"}"

exit 0

Redact Secrets from Tool Output

#!/bin/bash
# redact-secrets.sh
payload=$(cat)

# Redact common secret patterns
redacted=$(echo "$payload" | sed -E '
  s/sk-[a-zA-Z0-9]{32,}/[REDACTED]/g
  s/ghp_[a-zA-Z0-9]{36}/[REDACTED]/g
  s/password=[^&\s]+/password=[REDACTED]/g
')

echo "{\"action\":\"modify\",\"data\":$(echo "$redacted" | jq '.data')}"
exit 0

Block File Writes Outside Project

#!/bin/bash
# sandbox-writes.sh
payload=$(cat)
tool=$(echo "$payload" | jq -r '.data.tool')

if [ "$tool" = "write_file" ]; then
    path=$(echo "$payload" | jq -r '.data.arguments.path')

    # Only allow writes under current project
    if [[ ! "$path" =~ ^/workspace/ ]]; then
        echo "File writes only allowed in /workspace" >&2
        exit 1
    fi
fi

exit 0

Best Practices

  1. Keep hooks fast — Set appropriate timeouts (default: 5s)
  2. Handle errors gracefully — Use exit 0 unless you want to block
  3. Log for debugging — Write to a log file, not stdout
  4. Test locally first — Pipe sample JSON through your script
  5. Use jq for JSON — It’s reliable and fast for parsing

Local LLM Support

Moltis can run LLM inference locally on your machine without requiring an API key or internet connection. This enables fully offline operation and keeps your conversations private.

Backends

Moltis supports two backends for local inference:

BackendFormatPlatformGPU Acceleration
GGUF (llama.cpp).gguf filesmacOS, Linux, WindowsMetal (macOS), CUDA (NVIDIA)
MLXMLX model reposmacOS (Apple Silicon only)Apple Silicon neural engine

GGUF (llama.cpp)

GGUF is the primary backend, powered by llama.cpp. It supports quantized models in the GGUF format, which significantly reduces memory requirements while maintaining good quality.

Advantages:

  • Cross-platform (macOS, Linux, Windows)
  • Wide model compatibility (any GGUF model)
  • GPU acceleration on both NVIDIA (CUDA) and Apple Silicon (Metal)
  • Mature and well-tested

MLX

MLX is Apple’s machine learning framework optimized for Apple Silicon. Models from the mlx-community on HuggingFace are specifically optimized for M1/M2/M3/M4 chips.

Advantages:

  • Native Apple Silicon performance
  • Efficient unified memory usage
  • Lower latency on Macs

Requirements:

  • macOS with Apple Silicon (M1/M2/M3/M4)

Memory Requirements

Models are organized by memory tiers based on your system RAM:

TierRAMRecommended Models
Tiny4GBQwen 2.5 Coder 1.5B, Llama 3.2 1B
Small8GBQwen 2.5 Coder 3B, Llama 3.2 3B
Medium16GBQwen 2.5 Coder 7B, Llama 3.1 8B
Large32GB+Qwen 2.5 Coder 14B, DeepSeek Coder V2 Lite

Moltis automatically detects your system memory and suggests appropriate models in the UI.

Configuration

  1. Navigate to Providers in the sidebar
  2. Click Add Provider
  3. Select Local LLM
  4. Choose a model from the registry or search HuggingFace
  5. Click Configure — the model will download automatically

Via Configuration File

Add to ~/.moltis/moltis.toml:

[providers.local]
model = "qwen2.5-coder-7b-q4_k_m"

For custom GGUF files:

[providers.local]
model = "my-custom-model"
model_path = "/path/to/model.gguf"

Model Storage

Downloaded models are cached in ~/.cache/moltis/models/ by default. This directory can grow large (several GB per model).

To change the cache location:

[providers.local]
cache_dir = "/custom/models/path"

HuggingFace Integration

You can search and download models directly from HuggingFace:

  1. In the Add Provider dialog, click “Search HuggingFace”
  2. Enter a search term (e.g., “qwen coder”)
  3. Select GGUF or MLX backend
  4. Choose a model from the results
  5. The model will be downloaded on first use

Finding GGUF Models

Look for repositories with “GGUF” in the name on HuggingFace:

  • TheBloke — large collection of quantized models
  • bartowski — Llama 3.x GGUF models
  • Qwen — official Qwen GGUF models

Finding MLX Models

MLX models are available from mlx-community:

  • Pre-converted models optimized for Apple Silicon
  • Look for models ending in -4bit or -8bit for quantized versions

GPU Acceleration

Metal (macOS)

Metal acceleration is enabled by default on macOS. The number of GPU layers can be configured:

[providers.local]
gpu_layers = 99  # Offload all layers to GPU

CUDA (NVIDIA)

Requires building with the local-llm-cuda feature:

cargo build --release --features local-llm-cuda

Limitations

Local LLM models have some limitations compared to cloud providers:

  1. No tool calling — Local models don’t support function/tool calling. When using a local model, features like file operations, shell commands, and memory search are disabled.

  2. Slower inference — Depending on your hardware, local inference may be significantly slower than cloud APIs.

  3. Quality varies — Smaller quantized models may produce lower quality responses than larger cloud models.

  4. Context window — Local models typically have smaller context windows (8K-32K tokens vs 128K+ for cloud models).

Chat Templates

Different model families use different chat formatting. Moltis automatically detects the correct template for registered models:

  • ChatML — Qwen, many instruction-tuned models
  • Llama 3 — Meta’s Llama 3.x family
  • DeepSeek — DeepSeek Coder models

For custom models, the template is auto-detected from the model metadata when possible.

Troubleshooting

Model fails to load

  • Check you have enough RAM (see memory tier table above)
  • Verify the GGUF file isn’t corrupted (re-download if needed)
  • Ensure the model file matches the expected architecture

Slow inference

  • Enable GPU acceleration (Metal on macOS, CUDA on Linux)
  • Try a smaller/more quantized model
  • Reduce context size in config

Out of memory

  • Choose a model from a lower memory tier
  • Close other applications to free RAM
  • Use a more aggressively quantized model (Q4_K_M vs Q8_0)

Feature Flag

Local LLM support requires the local-llm feature flag at compile time:

cargo build --release --features local-llm

This is enabled by default in release builds.

Sandbox Backends

Moltis runs LLM-generated commands inside containers to protect your host system. The sandbox backend controls which container technology is used.

Backend Selection

Configure in moltis.toml:

[tools.exec.sandbox]
backend = "auto"          # default — picks the best available
# backend = "docker"      # force Docker
# backend = "apple-container"  # force Apple Container (macOS only)

With "auto" (the default), Moltis picks the strongest available backend:

PriorityBackendPlatformIsolation
1Apple ContainermacOSVM (Virtualization.framework)
2DockeranyLinux namespaces / cgroups
3none (host)anyno isolation

Apple Container runs each sandbox in a lightweight virtual machine using Apple’s Virtualization.framework. Every container gets its own kernel, so a kernel exploit inside the sandbox cannot reach the host — unlike Docker, which shares the host kernel.

Install

Download the signed installer from GitHub:

# Download the installer package
gh release download --repo apple/container --pattern "container-installer-signed.pkg" --dir /tmp

# Install (requires admin)
sudo installer -pkg /tmp/container-installer-signed.pkg -target /

# First-time setup — downloads a default Linux kernel
container system start

Alternatively, build from source with brew install container (requires Xcode 26+).

Verify

container --version
# Run a quick test
container run --rm ubuntu echo "hello from VM"

Once installed, restart moltis gateway — the startup banner will show sandbox: apple-container backend.

Docker

Docker is supported on macOS, Linux, and Windows. On macOS it runs inside a Linux VM managed by Docker Desktop, so it is reasonably isolated but adds more overhead than Apple Container.

Install from https://docs.docker.com/get-docker/

No sandbox

If neither runtime is found, commands execute directly on the host. The startup banner will show a warning. This is not recommended for untrusted workloads.

Per-session overrides

The web UI allows toggling sandboxing per session and selecting a custom container image. These overrides persist across gateway restarts.

Resource limits

[tools.exec.sandbox.resource_limits]
memory_limit = "512M"
cpu_quota = 1.0
pids_max = 256

Session State

Moltis provides a per-session key-value store that allows skills, extensions, and the agent itself to persist context across messages within a session.

Overview

Session state is scoped to a (session_key, namespace, key) triple, backed by SQLite. Each entry stores a string value and is automatically timestamped.

The agent accesses state through the session_state tool, which supports three operations: get, set, and list.

Agent Tool

The session_state tool is registered as a built-in tool and available in every session.

Get a value

{
  "op": "get",
  "namespace": "my-skill",
  "key": "last_query"
}

Set a value

{
  "op": "set",
  "namespace": "my-skill",
  "key": "last_query",
  "value": "SELECT * FROM users"
}

List all keys in a namespace

{
  "op": "list",
  "namespace": "my-skill"
}

Namespacing

Every state entry belongs to a namespace. This prevents collisions between different skills or extensions using state in the same session. Use your skill name as the namespace.

Storage

State is stored in the session_state table in the main SQLite database (moltis.db). The migration is in crates/sessions/migrations/20260205120000_session_state.sql.

Tip

State values are strings. To store structured data, serialize to JSON before writing and parse after reading.

Session Branching

Session branching (forking) lets you create an independent copy of a conversation at any point. The new session diverges without affecting the original — useful for exploring alternative approaches, running “what if” scenarios, or preserving a checkpoint before a risky prompt.

Forking from the UI

There are two ways to fork a session in the web UI:

  • Chat header — click the Fork button in the header bar (next to Delete). This is visible for every session except cron sessions.
  • Sidebar — hover over a session in the sidebar and click the fork icon that appears in the action buttons.

Both create a new session that copies all messages from the current one and immediately switch you to it.

Forked sessions appear indented under their parent in the sidebar, with a branch icon to distinguish them from top-level sessions. The metadata line shows fork@N where N is the message index at which the fork occurred.

Agent Tool

The agent can also fork programmatically using the branch_session tool:

{
  "at_message": 5,
  "label": "explore-alternative"
}
  • at_message — the message index to fork at (messages 0..N are copied). If omitted, all messages are copied.
  • label — optional human-readable label for the new session.

The tool returns the new session key.

RPC Method

The sessions.fork RPC method is the underlying mechanism:

{ "key": "main", "at_message": 5, "label": "my-fork" }

On success the response payload contains { "sessionKey": "session:<uuid>" }.

What Gets Inherited

When forking, the new session inherits:

InheritedNot inherited
Messages (up to fork point)Worktree branch
Model selectionSandbox settings
Project assignmentChannel binding
MCP disabled flag

Parent-Child Relationships

Fork relationships are stored directly on the sessions table:

  • parent_session_key — the key of the session this was forked from.
  • fork_point — the message index where the fork occurred.

These fields drive the tree rendering in the sidebar. Sessions with a parent appear indented under it; deeply nested forks indent further.

Deleting a parent

Deleting a parent session does not cascade to its children. Child sessions become top-level sessions — they keep their messages and history but lose their visual nesting in the sidebar.

When you delete a forked session, the UI navigates back to its parent session. If the deleted session had no parent (or the parent no longer exists), it falls back to the next sibling or main.

Independence

A forked session is fully independent after creation. Changes to the parent do not propagate to the fork, and vice versa.

Skill Self-Extension

Moltis can create, update, and delete skills at runtime through agent tools, enabling the system to extend its own capabilities during a conversation.

Overview

Three agent tools manage project-local skills:

ToolDescription
create_skillWrite a new SKILL.md to .moltis/skills/<name>/
update_skillOverwrite an existing skill’s SKILL.md
delete_skillRemove a skill directory

Skills created this way are project-local and stored in the working directory’s .moltis/skills/ folder. They become available on the next message automatically thanks to the skill watcher.

Skill Watcher

The skill watcher (crates/skills/src/watcher.rs) monitors skill directories for filesystem changes using debounced notifications. When a SKILL.md file is created, modified, or deleted, the watcher emits a skills.changed event via the WebSocket event bus so the UI can refresh.

Tip

The watcher uses debouncing to avoid firing multiple events for rapid successive edits (e.g. an editor writing a temp file then renaming).

Creating a Skill

The agent can create a skill by calling the create_skill tool:

{
  "name": "summarize-pr",
  "content": "# summarize-pr\n\nSummarize a GitHub pull request...",
  "description": "Summarize GitHub PRs with key changes and review notes"
}

This writes .moltis/skills/summarize-pr/SKILL.md with the provided content. The skill discoverer picks it up on the next message.

Updating a Skill

{
  "name": "summarize-pr",
  "content": "# summarize-pr\n\nUpdated instructions..."
}

Deleting a Skill

{
  "name": "summarize-pr"
}

This removes the entire .moltis/skills/summarize-pr/ directory.

Warning

Deleted skills cannot be recovered. The agent should confirm with the user before deleting a skill.

Mobile PWA and Push Notifications

Moltis can be installed as a Progressive Web App (PWA) on mobile devices, providing a native app-like experience with push notifications.

Installing on Mobile

iOS (Safari)

  1. Open moltis in Safari
  2. Tap the Share button (box with arrow)
  3. Scroll down and tap “Add to Home Screen”
  4. Tap “Add” to confirm

The app will appear on your home screen with the moltis icon.

Android (Chrome)

  1. Open moltis in Chrome
  2. You should see an install banner at the bottom - tap “Install”
  3. Or tap the three-dot menu and select “Install app” or “Add to Home Screen”
  4. Tap “Install” to confirm

The app will appear in your app drawer and home screen.

PWA Features

When installed as a PWA, moltis provides:

  • Standalone mode: Full-screen experience without browser UI
  • Offline support: Previously loaded content remains accessible
  • Fast loading: Assets are cached locally
  • Home screen icon: Quick access from your device’s home screen
  • Safe area support: Proper spacing for notched devices (iPhone X+)

Push Notifications

Push notifications allow you to receive alerts when the LLM responds, even when you’re not actively viewing the app.

Enabling Push Notifications

  1. Open the moltis app (must be installed as PWA on Safari/iOS)
  2. Go to Settings > Notifications
  3. Click Enable to subscribe to push notifications
  4. When prompted, allow notification permissions

Safari/iOS Note: Push notifications only work when the app is installed as a PWA. If you see “Installation required”, add moltis to your Dock first:

  • macOS: File → Add to Dock
  • iOS: Share → Add to Home Screen

Managing Subscriptions

The Settings > Notifications page shows all subscribed devices:

  • Device name: Parsed from user agent (e.g., “Safari on macOS”, “iPhone”)
  • IP address: Client IP at subscription time (supports proxies via X-Forwarded-For)
  • Subscription date: When the device subscribed

You can remove any subscription by clicking the Remove button. This works from any device - useful for revoking access to old devices.

Subscription changes are broadcast in real-time via WebSocket, so all connected clients see updates immediately.

How It Works

Moltis uses the Web Push API with VAPID (Voluntary Application Server Identification) keys:

  1. VAPID Keys: On first run, the server generates a P-256 ECDSA key pair
  2. Subscription: The browser creates a push subscription using the server’s public key
  3. Registration: The subscription details are sent to the server and stored
  4. Notification: When you need to be notified, the server encrypts and sends a push message

Push API Routes

The gateway exposes these API endpoints for push notifications:

EndpointMethodDescription
/api/push/vapid-keyGETGet the VAPID public key for subscription
/api/push/subscribePOSTRegister a push subscription
/api/push/unsubscribePOSTRemove a push subscription
/api/push/statusGETGet push service status and subscription list

Subscribe Request

{
  "endpoint": "https://fcm.googleapis.com/fcm/send/...",
  "keys": {
    "p256dh": "base64url-encoded-key",
    "auth": "base64url-encoded-auth"
  }
}

Status Response

{
  "enabled": true,
  "subscription_count": 2,
  "subscriptions": [
    {
      "endpoint": "https://fcm.googleapis.com/...",
      "device": "Safari on macOS",
      "ip": "192.168.1.100",
      "created_at": "2025-02-05T23:30:00Z"
    }
  ]
}

Notification Payload

Push notifications include:

{
  "title": "moltis",
  "body": "New response available",
  "url": "/chats",
  "sessionKey": "session-id"
}

Clicking a notification will open or focus the app and navigate to the relevant chat.

Configuration

Feature Flag

Push notifications are controlled by the push-notifications feature flag, which is enabled by default. To disable:

# In your Cargo.toml or when building
[dependencies]
moltis-gateway = { default-features = false, features = ["web-ui", "tls"] }

Or build without the feature:

cargo build --no-default-features --features web-ui,tls,tailscale,file-watcher

Data Storage

Push notification data is stored in push.json in the data directory:

  • VAPID keys: Generated once and reused
  • Subscriptions: List of all registered browser subscriptions

The VAPID keys are persisted so subscriptions remain valid across restarts.

Mobile UI Considerations

The mobile interface adapts for smaller screens:

  • Navigation drawer: The sidebar becomes a slide-out drawer on mobile
  • Sessions panel: Displayed as a bottom sheet that can be swiped
  • Touch targets: Minimum 44px touch targets for accessibility
  • Safe areas: Proper insets for devices with notches or home indicators

Responsive Breakpoints

  • Mobile: < 768px width (drawer navigation)
  • Desktop: ≥ 768px width (sidebar navigation)

Browser Support

FeatureChromeSafariFirefoxEdge
PWA Install✅ (iOS)
Push Notifications✅ (iOS 16.4+)
Service Worker
Offline Support

Note: iOS push notifications require iOS 16.4 or later and the app must be installed as a PWA.

Troubleshooting

Notifications Not Working

  1. Check permissions: Ensure notifications are allowed in browser/OS settings
  2. Check subscription: Go to Settings > Notifications to see if your device is listed
  3. Check server logs: Look for push: prefixed log messages for delivery status
  4. Safari/iOS specific:
    • Must be installed as PWA (Add to Dock/Home Screen)
    • iOS requires version 16.4 or later
    • The Enable button is disabled until installed as PWA
  5. Behind a proxy: Ensure your proxy forwards X-Forwarded-For or X-Real-IP headers

PWA Not Installing

  1. HTTPS required: PWAs require a secure connection (or localhost)
  2. Valid manifest: Ensure /manifest.json loads correctly
  3. Service worker: Check that /sw.js registers without errors
  4. Clear cache: Try clearing browser cache and reloading

Service Worker Issues

Clear the service worker registration:

  1. Open browser DevTools
  2. Go to Application > Service Workers
  3. Click “Unregister” on the moltis service worker
  4. Reload the page

Security Architecture

Moltis is designed with a defense-in-depth security model. This document explains the key security features and provides guidance for production deployments.

Overview

Moltis runs AI agents that can execute code and interact with external systems. This power requires multiple layers of protection:

  1. Human-in-the-loop approval for dangerous commands
  2. Sandbox isolation for command execution
  3. Channel authorization for external integrations
  4. Rate limiting to prevent resource abuse
  5. Scope-based access control for API authorization

Command Execution Approval

By default, Moltis requires explicit user approval before executing potentially dangerous commands. This “human-in-the-loop” design ensures the AI cannot take destructive actions without consent.

How It Works

When the agent wants to run a command:

  1. The command is analyzed against approval policies
  2. If approval is required, the user sees a prompt in the UI
  3. The user can approve, deny, or modify the command
  4. Only approved commands execute

Approval Policies

Configure approval behavior in moltis.toml:

[tools.exec]
approval_mode = "always"  # always require approval
# approval_mode = "smart" # auto-approve safe commands (default)
# approval_mode = "never" # dangerous: never require approval

Recommendation: Keep approval_mode = "smart" (the default) for most use cases. Only use "never" in fully automated, sandboxed environments.

Sandbox Isolation

Commands execute inside isolated containers (Docker or Apple Container) by default. This protects your host system from:

  • Accidental file deletion or modification
  • Malicious code execution
  • Resource exhaustion (memory, CPU, disk)

See sandbox.md for backend configuration.

Resource Limits

[tools.exec.sandbox.resource_limits]
memory_limit = "512M"
cpu_quota = 1.0
pids_max = 256

Network Isolation

Sandbox containers have limited network access by default. Outbound connections are allowed but the sandbox cannot bind to host ports.

Channel Authorization

Channels (Telegram, Slack, etc.) allow external parties to interact with your Moltis agent. This requires careful access control.

Sender Allowlisting

When a new sender contacts the agent through a channel, they are placed in a pending queue. You must explicitly approve or deny each sender before they can interact with the agent.

UI: Settings > Channels > Pending Senders

Per-Channel Permissions

Each channel can have different permission levels:

  • Read-only: Sender can ask questions, agent responds
  • Execute: Sender can trigger actions (with approval still required)
  • Admin: Full access including configuration changes

Channel Isolation

Channels run in isolated sessions by default. A malicious message from one channel cannot affect another channel’s session or the main UI session.

Cron Job Security

Scheduled tasks (cron jobs) can run agent turns automatically. Security considerations:

Rate Limiting

To prevent prompt injection attacks from rapidly creating many cron jobs:

[cron]
rate_limit_max = 10           # max jobs per window
rate_limit_window_secs = 60   # window duration (1 minute)

This limits job creation to 10 per minute by default. System jobs (like heartbeat) bypass this limit.

Job Notifications

When cron jobs are created, updated, or removed, Moltis broadcasts events:

  • cron.job.created - A new job was created
  • cron.job.updated - An existing job was modified
  • cron.job.removed - A job was deleted

Monitor these events to detect suspicious automated job creation.

Sandbox for Cron Jobs

Cron job execution uses sandbox isolation by default:

# Per-job configuration
[cron.job.sandbox]
enabled = true              # run in sandbox (default)
# image = "custom:latest"   # optional custom image

Identity Protection

The agent’s identity (name, personality “soul”) is stored in moltis.toml. Modifying identity requires the operator.write scope, not just operator.read.

This prevents prompt injection attacks from subtly modifying the agent’s personality to make it more compliant with malicious requests.

API Authorization

The gateway API uses role-based access control with scopes:

ScopePermissions
operator.readView status, list jobs, read history
operator.writeSend messages, create jobs, modify configuration
operator.adminAll permissions (includes all other scopes)
operator.approvalsHandle command approval requests
operator.pairingManage device/node pairing

API Keys

API keys authenticate external tools and scripts connecting to Moltis. Keys can have full access (all scopes) or be restricted to specific scopes for defense-in-depth.

Creating API Keys

Web UI: Settings > Security > API Keys

  1. Enter a label describing the key’s purpose
  2. Choose “Full access” or select specific scopes
  3. Click “Generate key”
  4. Copy the key immediately — it’s only shown once

CLI:

# Full access key
moltis auth create-api-key --label "CI pipeline"

# Scoped key (comma-separated scopes)
moltis auth create-api-key --label "Monitor" --scopes "operator.read"
moltis auth create-api-key --label "Automation" --scopes "operator.read,operator.write"

Using API Keys

Pass the key in the connect handshake over WebSocket:

{
  "method": "connect",
  "params": {
    "client": { "id": "my-tool", "version": "1.0.0" },
    "auth": { "api_key": "mk_abc123..." }
  }
}

Or use Bearer authentication for REST API calls:

Authorization: Bearer mk_abc123...

Scope Recommendations

Use CaseRecommended Scopes
Read-only monitoringoperator.read
Automated workflowsoperator.read, operator.write
Approval handlingoperator.read, operator.approvals
Full automationFull access (no scope restrictions)

Best practice: Use the minimum necessary scopes. If a key only needs to read status and logs, don’t grant operator.write.

Backward Compatibility

Existing API keys (created before scopes were added) have full access. Newly created keys without explicit scopes also have full access.

Network Security

TLS Encryption

HTTPS is enabled by default with auto-generated certificates:

[tls]
enabled = true
auto_generate = true

For production, use certificates from a trusted CA or configure custom certificates.

Origin Validation

WebSocket connections validate the Origin header to prevent cross-site WebSocket hijacking (CSWSH). Connections from untrusted origins are rejected.

SSRF Protection

The web_fetch tool resolves DNS and blocks requests to private IP ranges (loopback, RFC 1918, link-local, CGNAT). This prevents server-side request forgery attacks.

Production Recommendations

1. Enable Authentication

By default, Moltis requires a password when accessed from non-localhost:

[auth]
disabled = false  # keep this false in production

2. Use Sandbox Isolation

Always run with sandbox enabled in production:

[tools.exec.sandbox]
enabled = true
backend = "auto"  # uses strongest available

3. Limit Rate Limits

Tighten rate limits for untrusted environments:

[cron]
rate_limit_max = 5
rate_limit_window_secs = 300  # 5 per 5 minutes

4. Review Channel Senders

Regularly audit approved senders and revoke access for unknown parties.

5. Monitor Events

Watch for these suspicious patterns:

  • Rapid cron job creation
  • Identity modification attempts
  • Unusual command patterns in approval requests
  • New channel senders from unexpected sources

6. Network Segmentation

Run Moltis on a private network or behind a reverse proxy with:

  • IP allowlisting
  • Rate limiting
  • Web Application Firewall (WAF) rules

7. Keep Software Updated

Subscribe to security advisories and update promptly when vulnerabilities are disclosed.

Reporting Security Issues

Report security vulnerabilities privately to the maintainers. Do not open public issues for security bugs.

See the repository’s SECURITY.md for contact information.

Running Moltis in Docker

Moltis is available as a multi-architecture Docker image supporting both linux/amd64 and linux/arm64. The image is published to GitHub Container Registry on every release.

Quick Start

docker run -d \
  --name moltis \
  -p 13131:13131 \
  -v moltis-config:/home/moltis/.config/moltis \
  -v moltis-data:/home/moltis/.moltis \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

Open http://localhost:13131 in your browser and configure your LLM provider to start chatting.

Note

When accessing from localhost, no authentication is required. If you access Moltis from a different machine (e.g., over the network), a setup code is printed to the container logs for authentication setup:

```bash docker logs moltis ```

Volume Mounts

Moltis uses two directories that should be persisted:

PathContents
/home/moltis/.config/moltisConfiguration files: moltis.toml, credentials.json, mcp-servers.json
/home/moltis/.moltisRuntime data: databases, sessions, memory files, logs

You can use named volumes (as shown above) or bind mounts to local directories for easier access to configuration files:

docker run -d \
  --name moltis \
  -p 13131:13131 \
  -v ./config:/home/moltis/.config/moltis \
  -v ./data:/home/moltis/.moltis \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

With bind mounts, you can edit config/moltis.toml directly on the host.

Docker Socket (Sandbox Execution)

Moltis runs LLM-generated shell commands inside isolated containers for security. When Moltis itself runs in a container, it needs access to the host’s container runtime to create these sandbox containers.

Without the socket mount, sandbox execution is disabled. The agent will still work for chat-only interactions, but any tool that runs shell commands will fail.

# Required for sandbox execution
-v /var/run/docker.sock:/var/run/docker.sock

Security Consideration

Mounting the Docker socket gives the container full access to the Docker daemon. This is equivalent to root access on the host for practical purposes. Only run Moltis containers from trusted sources (official images from ghcr.io/penso/moltis).

If you cannot mount the Docker socket, Moltis will run in “no sandbox” mode — commands execute directly inside the Moltis container itself, which provides no isolation.

Docker Compose

See examples/docker-compose.yml for a complete example:

services:
  moltis:
    image: ghcr.io/penso/moltis:latest
    container_name: moltis
    restart: unless-stopped
    ports:
      - "13131:13131"
    volumes:
      - ./config:/home/moltis/.config/moltis
      - ./data:/home/moltis/.moltis
      - /var/run/docker.sock:/var/run/docker.sock

Start with:

docker compose up -d
docker compose logs -f moltis  # watch for startup messages

Podman Support

Moltis works with Podman using its Docker-compatible API. Mount the Podman socket instead of the Docker socket:

# Podman rootless
podman run -d \
  --name moltis \
  -p 13131:13131 \
  -v moltis-config:/home/moltis/.config/moltis \
  -v moltis-data:/home/moltis/.moltis \
  -v /run/user/$(id -u)/podman/podman.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

# Podman rootful
podman run -d \
  --name moltis \
  -p 13131:13131 \
  -v moltis-config:/home/moltis/.config/moltis \
  -v moltis-data:/home/moltis/.moltis \
  -v /run/podman/podman.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

You may need to enable the Podman socket service first:

# Rootless
systemctl --user enable --now podman.socket

# Rootful
sudo systemctl enable --now podman.socket

Environment Variables

VariableDescription
MOLTIS_CONFIG_DIROverride config directory (default: ~/.config/moltis)
MOLTIS_DATA_DIROverride data directory (default: ~/.moltis)

Example:

docker run -d \
  --name moltis \
  -p 13131:13131 \
  -e MOLTIS_CONFIG_DIR=/config \
  -e MOLTIS_DATA_DIR=/data \
  -v ./config:/config \
  -v ./data:/data \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/penso/moltis:latest

Building Locally

To build the Docker image from source:

# Single architecture (current platform)
docker build -t moltis:local .

# Multi-architecture (requires buildx)
docker buildx build --platform linux/amd64,linux/arm64 -t moltis:local .

OrbStack

OrbStack on macOS works identically to Docker — use the same socket path (/var/run/docker.sock). OrbStack’s lightweight Linux VM provides good isolation with lower resource usage than Docker Desktop.

Troubleshooting

“Cannot connect to Docker daemon”

The Docker socket is not mounted or the Moltis user doesn’t have permission to access it. Verify:

docker exec moltis ls -la /var/run/docker.sock

Setup code not appearing in logs (for network access)

The setup code only appears when accessing from a non-localhost address. If you’re accessing from the same machine via localhost, no setup code is needed. For network access, wait a few seconds for the gateway to start, then check logs:

docker logs moltis 2>&1 | grep -i setup

Permission denied on bind mounts

When using bind mounts, ensure the directories exist and are writable:

mkdir -p ./config ./data
chmod 755 ./config ./data

The container runs as user moltis (UID 1000). If you see permission errors, you may need to adjust ownership:

sudo chown -R 1000:1000 ./config ./data

Streaming Architecture

This document explains how streaming responses work in Moltis, from the LLM provider through to the web UI.

Overview

Moltis supports real-time token streaming for LLM responses, providing a much better user experience than waiting for the complete response. Streaming works even when tools are enabled, allowing users to see text as it arrives while tool calls are accumulated and executed.

Components

1. StreamEvent Enum (crates/agents/src/model.rs)

The StreamEvent enum defines all events that can occur during a streaming LLM response:

#![allow(unused)]
fn main() {
pub enum StreamEvent {
    /// Text content delta - a chunk of text from the LLM.
    Delta(String),

    /// A tool call has started (for providers with native tool support).
    ToolCallStart { id: String, name: String, index: usize },

    /// Streaming delta for tool call arguments (JSON fragment).
    ToolCallArgumentsDelta { index: usize, delta: String },

    /// A tool call's arguments are complete.
    ToolCallComplete { index: usize },

    /// Stream completed successfully with token usage.
    Done(Usage),

    /// An error occurred.
    Error(String),
}
}

2. LlmProvider Trait (crates/agents/src/model.rs)

The LlmProvider trait defines two streaming methods:

  • stream() - Basic streaming without tool support
  • stream_with_tools() - Streaming with tool schemas passed to the API

Providers that support streaming with tools (like Anthropic) override stream_with_tools(). Others fall back to stream() which ignores the tools parameter.

3. Anthropic Provider (crates/agents/src/providers/anthropic.rs)

The Anthropic provider implements streaming by:

  1. Making a POST request to /v1/messages with "stream": true
  2. Reading Server-Sent Events (SSE) from the response
  3. Parsing events and yielding appropriate StreamEvent variants:
SSE Event TypeStreamEvent
content_block_start (text)(none, just tracking)
content_block_start (tool_use)ToolCallStart
content_block_delta (text_delta)Delta
content_block_delta (input_json_delta)ToolCallArgumentsDelta
content_block_stopToolCallComplete (for tool blocks)
message_delta(usage tracking)
message_stopDone
errorError

4. Agent Runner (crates/agents/src/runner.rs)

The run_agent_loop_streaming() function orchestrates the streaming agent loop:

┌─────────────────────────────────────────────────────────┐
│                    Agent Loop                           │
│                                                         │
│  1. Call provider.stream_with_tools()                   │
│                                                         │
│  2. While stream has events:                            │
│     ├─ Delta(text) → emit RunnerEvent::TextDelta        │
│     ├─ ToolCallStart → accumulate tool call             │
│     ├─ ToolCallArgumentsDelta → accumulate args         │
│     ├─ ToolCallComplete → finalize args                 │
│     ├─ Done → record usage                              │
│     └─ Error → return error                             │
│                                                         │
│  3. If no tool calls → return accumulated text          │
│                                                         │
│  4. Execute tool calls concurrently                     │
│     ├─ Emit ToolCallStart events                        │
│     ├─ Run tools in parallel                            │
│     └─ Emit ToolCallEnd events                          │
│                                                         │
│  5. Append tool results to messages                     │
│                                                         │
│  6. Loop back to step 1                                 │
└─────────────────────────────────────────────────────────┘

5. Gateway (crates/gateway/src/chat.rs)

The gateway’s run_with_tools() function:

  1. Sets up an event callback that broadcasts RunnerEvents via WebSocket
  2. Calls run_agent_loop_streaming()
  3. Broadcasts events to connected clients as JSON frames

Event types broadcast to the UI:

RunnerEventWebSocket State
Thinkingthinking
ThinkingDonethinking_done
TextDelta(text)delta with text field
ToolCallStarttool_call_start
ToolCallEndtool_call_end
Iteration(n)iteration

6. Frontend (crates/gateway/src/assets/js/)

The JavaScript frontend handles streaming via WebSocket:

  1. websocket.js - Receives WebSocket frames and dispatches to handlers
  2. events.js - Event bus for distributing events to components
  3. state.js - Manages streaming state (streamText, streamEl)

When a delta event arrives:

function handleChatDelta(p, isActive, isChatPage) {
  if (!(p.text && isActive && isChatPage)) return;
  removeThinking();
  if (!S.streamEl) {
    S.setStreamText("");
    S.setStreamEl(document.createElement("div"));
    S.streamEl.className = "msg assistant";
    S.chatMsgBox.appendChild(S.streamEl);
  }
  S.setStreamText(S.streamText + p.text);
  setSafeMarkdownHtml(S.streamEl, S.streamText);
  S.chatMsgBox.scrollTop = S.chatMsgBox.scrollHeight;
}

Data Flow

┌──────────────┐     SSE      ┌──────────────┐   StreamEvent   ┌──────────────┐
│   Anthropic  │─────────────▶│   Provider   │────────────────▶│    Runner    │
│     API      │              │              │                 │              │
└──────────────┘              └──────────────┘                 └──────┬───────┘
                                                                      │
                                                               RunnerEvent
                                                                      │
                                                                      ▼
┌──────────────┐   WebSocket  ┌──────────────┐    Callback     ┌──────────────┐
│   Browser    │◀─────────────│   Gateway    │◀────────────────│   Callback   │
│              │              │              │                 │   (on_event) │
└──────────────┘              └──────────────┘                 └──────────────┘

Adding Streaming to New Providers

To add streaming support for a new LLM provider:

  1. Implement the stream() method (basic streaming)
  2. If the provider supports tools in streaming mode, override stream_with_tools()
  3. Parse the provider’s streaming format and yield appropriate StreamEvent variants
  4. Handle errors gracefully with StreamEvent::Error
  5. Always emit StreamEvent::Done with usage statistics when complete

Example skeleton:

#![allow(unused)]
fn main() {
fn stream_with_tools(
    &self,
    messages: Vec<serde_json::Value>,
    tools: Vec<serde_json::Value>,
) -> Pin<Box<dyn Stream<Item = StreamEvent> + Send + '_>> {
    Box::pin(async_stream::stream! {
        // Make streaming request to provider API
        let resp = self.client.post(...)
            .json(&body)
            .send()
            .await?;

        // Read SSE or streaming response
        let mut byte_stream = resp.bytes_stream();

        while let Some(chunk) = byte_stream.next().await {
            // Parse chunk and yield events
            match parse_event(&chunk) {
                TextDelta(text) => yield StreamEvent::Delta(text),
                ToolStart { id, name, idx } => {
                    yield StreamEvent::ToolCallStart { id, name, index: idx }
                }
                // ... handle other event types
            }
        }

        yield StreamEvent::Done(usage);
    })
}
}

Performance Considerations

  • Unbounded channels: WebSocket send channels are unbounded, so slow clients can accumulate messages in memory
  • Markdown re-rendering: The frontend re-renders full markdown on each delta, which is O(n) work per delta. For very long responses, this can cause UI lag
  • Concurrent tool execution: Multiple tool calls are executed in parallel using futures::join_all(), improving throughput when the LLM requests several tools at once

SQLite Database Migrations

Moltis uses sqlx for database access and its built-in migration system for schema management. Each crate owns its migrations, keeping schema definitions close to the code that uses them.

Architecture

Each crate that uses SQLite has its own migrations/ directory and exposes a run_migrations() function. The gateway orchestrates running all migrations at startup in the correct dependency order.

crates/
├── projects/
│   ├── migrations/
│   │   └── 20240205100000_init.sql   # projects table
│   └── src/lib.rs                     # run_migrations()
├── sessions/
│   ├── migrations/
│   │   └── 20240205100001_init.sql   # sessions, channel_sessions
│   └── src/lib.rs                     # run_migrations()
├── cron/
│   ├── migrations/
│   │   └── 20240205100002_init.sql   # cron_jobs, cron_runs
│   └── src/lib.rs                     # run_migrations()
├── gateway/
│   ├── migrations/
│   │   └── 20240205100003_init.sql   # auth, message_log, channels
│   └── src/server.rs                  # orchestrates moltis.db migrations
└── memory/
    ├── migrations/
    │   └── 20240205100004_init.sql   # files, chunks, embedding_cache, FTS
    └── src/lib.rs                     # run_migrations() (separate memory.db)

How It Works

Migration Ownership

Each crate is autonomous and owns its schema:

CrateDatabaseTablesMigration File
moltis-projectsmoltis.dbprojects20240205100000_init.sql
moltis-sessionsmoltis.dbsessions, channel_sessions20240205100001_init.sql
moltis-cronmoltis.dbcron_jobs, cron_runs20240205100002_init.sql
moltis-gatewaymoltis.dbauth_*, passkeys, api_keys, env_variables, message_log, channels20240205100003_init.sql
moltis-memorymemory.dbfiles, chunks, embedding_cache, chunks_fts20240205100004_init.sql

Startup Sequence

The gateway runs migrations in dependency order:

#![allow(unused)]
fn main() {
// server.rs
moltis_projects::run_migrations(&db_pool).await?;   // 1. projects first
moltis_sessions::run_migrations(&db_pool).await?;   // 2. sessions (FK → projects)
moltis_cron::run_migrations(&db_pool).await?;       // 3. cron (independent)
sqlx::migrate!("./migrations").run(&db_pool).await?; // 4. gateway tables
}

Sessions depends on projects due to a foreign key (sessions.project_id references projects.id), so projects must migrate first.

Version Tracking

sqlx tracks applied migrations in the _sqlx_migrations table:

SELECT version, description, installed_on, success FROM _sqlx_migrations;

Migrations are identified by their timestamp prefix (e.g., 20240205100000), which must be globally unique across all crates.

Database Files

DatabaseLocationCrates
moltis.db~/.moltis/moltis.dbprojects, sessions, cron, gateway
memory.db~/.moltis/memory.dbmemory (separate, managed internally)

Adding New Migrations

Adding a Column to an Existing Table

  1. Create a new migration file in the owning crate:
# Example: adding tags to sessions
touch crates/sessions/migrations/20240301120000_add_tags.sql
  1. Write the migration SQL:
-- 20240301120000_add_tags.sql
ALTER TABLE sessions ADD COLUMN tags TEXT;
CREATE INDEX IF NOT EXISTS idx_sessions_tags ON sessions(tags);
  1. Rebuild to embed the migration:
cargo build

Adding a New Table to an Existing Crate

  1. Create the migration file with a new timestamp:
touch crates/sessions/migrations/20240302100000_session_bookmarks.sql
  1. Write the CREATE TABLE statement:
-- 20240302100000_session_bookmarks.sql
CREATE TABLE IF NOT EXISTS session_bookmarks (
    id         INTEGER PRIMARY KEY AUTOINCREMENT,
    session_key TEXT NOT NULL,
    name       TEXT NOT NULL,
    message_id INTEGER NOT NULL,
    created_at INTEGER NOT NULL
);

Adding Tables to a New Crate

  1. Create the migrations directory:
mkdir -p crates/new-feature/migrations
  1. Create the migration file with a globally unique timestamp:
touch crates/new-feature/migrations/20240401100000_init.sql
  1. Add run_migrations() to the crate’s lib.rs:
#![allow(unused)]
fn main() {
pub async fn run_migrations(pool: &sqlx::SqlitePool) -> anyhow::Result<()> {
    sqlx::migrate!("./migrations").run(pool).await?;
    Ok(())
}
}
  1. Call it from server.rs in the appropriate order:
#![allow(unused)]
fn main() {
moltis_new_feature::run_migrations(&db_pool).await?;
}

Timestamp Convention

Use YYYYMMDDHHMMSS format for migration filenames:

  • YYYY - 4-digit year
  • MM - 2-digit month
  • DD - 2-digit day
  • HH - 2-digit hour (24h)
  • MM - 2-digit minute
  • SS - 2-digit second

This ensures global uniqueness across crates. When adding migrations, use the current timestamp to avoid collisions.

SQLite Limitations

ALTER TABLE

SQLite has limited ALTER TABLE support:

  • ADD COLUMN: Supported ✓
  • DROP COLUMN: SQLite 3.35+ only
  • Rename column: Requires table recreation
  • Change column type: Requires table recreation

For complex schema changes, use the table recreation pattern:

-- Create new table with desired schema
CREATE TABLE sessions_new (
    -- new schema
);

-- Copy data (map old columns to new)
INSERT INTO sessions_new SELECT ... FROM sessions;

-- Swap tables
DROP TABLE sessions;
ALTER TABLE sessions_new RENAME TO sessions;

-- Recreate indexes
CREATE INDEX idx_sessions_created_at ON sessions(created_at);

Foreign Keys

SQLite foreign keys are checked at insert/update time, not migration time. Ensure migrations run in dependency order (parent table first).

Testing

Unit tests use in-memory databases with the crate’s init() method:

#![allow(unused)]
fn main() {
#[tokio::test]
async fn test_session_operations() {
    let pool = SqlitePool::connect("sqlite::memory:").await.unwrap();

    // Create schema for tests (init() retained for this purpose)
    SqliteSessionMetadata::init(&pool).await.unwrap();

    let meta = SqliteSessionMetadata::new(pool);
    // ... test code
}
}

The init() methods are retained (marked #[doc(hidden)]) specifically for tests. In production, migrations handle schema creation.

Troubleshooting

“failed to run migrations”

  1. Check file permissions on ~/.moltis/
  2. Ensure the database file isn’t locked by another process
  3. Check for syntax errors in migration SQL files

Migration Order Issues

If you see foreign key errors, verify the migration order in server.rs. Parent tables must be created before child tables with FK references.

Checking Migration Status

sqlite3 ~/.moltis/moltis.db "SELECT version, description, success FROM _sqlx_migrations ORDER BY version"

Resetting Migrations (Development Only)

# Backup first!
rm ~/.moltis/moltis.db
cargo run  # Creates fresh database with all migrations

Best Practices

DO

  • Use timestamp-based version numbers for global uniqueness
  • Keep each crate’s migrations in its own directory
  • Use IF NOT EXISTS for idempotent initial migrations
  • Test migrations on a copy of production data before deploying
  • Keep migrations small and focused

DON’T

  • Modify existing migration files after deployment
  • Reuse timestamps across crates
  • Put multiple crates’ tables in one migration file
  • Skip the dependency order in server.rs

Metrics and Tracing

Moltis includes comprehensive observability support through Prometheus metrics and tracing integration. This document explains how to enable, configure, and use these features.

Overview

The metrics system is built on the metrics crate facade, which provides a unified interface similar to the log crate. When the prometheus feature is enabled, metrics are exported in Prometheus text format for scraping by Grafana, Prometheus, or other monitoring tools.

All metrics are feature-gated — they add zero overhead when disabled.

Feature Flags

Metrics are controlled by two feature flags:

FeatureDescriptionDefault
metricsEnables metrics collection and the /api/metrics JSON APIEnabled
prometheusEnables the /metrics Prometheus endpoint (requires metrics)Enabled

Compile-Time Configuration

# Enable only metrics collection (no Prometheus endpoint)
moltis-gateway = { version = "0.1", features = ["metrics"] }

# Enable metrics with Prometheus export (default)
moltis-gateway = { version = "0.1", features = ["metrics", "prometheus"] }

# Enable metrics for specific crates
moltis-agents = { version = "0.1", features = ["metrics"] }
moltis-cron = { version = "0.1", features = ["metrics"] }

To build without metrics entirely:

cargo build --release --no-default-features --features "file-watcher,tailscale,tls,web-ui"

Prometheus Endpoint

When the prometheus feature is enabled, the gateway exposes a /metrics endpoint:

GET http://localhost:18789/metrics

This endpoint is unauthenticated to allow Prometheus scrapers to access it. It returns metrics in Prometheus text format:

# HELP moltis_http_requests_total Total number of HTTP requests handled
# TYPE moltis_http_requests_total counter
moltis_http_requests_total{method="GET",status="200",endpoint="/api/chat"} 42

# HELP moltis_llm_completion_duration_seconds Duration of LLM completion requests
# TYPE moltis_llm_completion_duration_seconds histogram
moltis_llm_completion_duration_seconds_bucket{provider="anthropic",model="claude-3-opus",le="1.0"} 5

Grafana Integration

To scrape metrics with Prometheus and visualize in Grafana:

  1. Add moltis to your prometheus.yml:
scrape_configs:
  - job_name: 'moltis'
    static_configs:
      - targets: ['localhost:18789']
    metrics_path: /metrics
    scrape_interval: 15s
  1. Import or create Grafana dashboards using the moltis_* metrics.

JSON API Endpoints

For the web UI dashboard and programmatic access, authenticated JSON endpoints are available:

EndpointDescription
GET /api/metricsFull metrics snapshot with aggregates and per-provider breakdown
GET /api/metrics/summaryLightweight counts for navigation badges
GET /api/metrics/historyTime-series data points for charts (last hour, 10s intervals)

History Endpoint

The /api/metrics/history endpoint returns historical metrics data for rendering time-series charts:

{
  "enabled": true,
  "interval_seconds": 10,
  "max_points": 60480,
  "points": [
    {
      "timestamp": 1706832000000,
      "llm_completions": 42,
      "llm_input_tokens": 15000,
      "llm_output_tokens": 8000,
      "http_requests": 150,
      "ws_active": 3,
      "tool_executions": 25,
      "mcp_calls": 12,
      "active_sessions": 2
    }
  ]
}

Metrics Persistence

Metrics history is persisted to SQLite, so historical data survives server restarts. The database is stored at ~/.moltis/metrics.db (or the configured data directory).

Key features:

  • 7-day retention: History is kept for 7 days (60,480 data points at 10-second intervals)
  • Automatic cleanup: Old data is automatically removed hourly
  • Startup recovery: History is loaded from the database when the server starts

The storage backend uses a trait-based design (MetricsStore), allowing alternative implementations (e.g., TimescaleDB) for larger deployments.

Storage Architecture

#![allow(unused)]
fn main() {
// The MetricsStore trait defines the storage interface
#[async_trait]
pub trait MetricsStore: Send + Sync {
    async fn save_point(&self, point: &MetricsHistoryPoint) -> Result<()>;
    async fn load_history(&self, since: u64, limit: usize) -> Result<Vec<MetricsHistoryPoint>>;
    async fn cleanup_before(&self, before: u64) -> Result<u64>;
    async fn latest_point(&self) -> Result<Option<MetricsHistoryPoint>>;
}
}

The default SqliteMetricsStore implementation stores data in a single table with an index on the timestamp column for efficient range queries.

Web UI Dashboard

The gateway includes a built-in metrics dashboard at /monitoring in the web UI. This page displays:

Overview Tab:

  • System metrics (uptime, connected clients, active sessions)
  • LLM usage (completions, tokens, cache statistics)
  • Tool execution statistics
  • MCP server status
  • Provider breakdown table
  • Prometheus endpoint (with copy button)

Charts Tab:

  • Token usage over time (input/output)
  • HTTP requests and LLM completions
  • WebSocket connections and active sessions
  • Tool executions and MCP calls

The dashboard uses uPlot for lightweight, high-performance time-series charts. Data updates every 10 seconds for current metrics and every 30 seconds for history.

Available Metrics

HTTP Metrics

MetricTypeLabelsDescription
moltis_http_requests_totalCountermethod, status, endpointTotal HTTP requests
moltis_http_request_duration_secondsHistogrammethod, status, endpointRequest latency
moltis_http_requests_in_flightGaugeCurrently processing requests

LLM/Agent Metrics

MetricTypeLabelsDescription
moltis_llm_completions_totalCounterprovider, modelTotal completions requested
moltis_llm_completion_duration_secondsHistogramprovider, modelCompletion latency
moltis_llm_input_tokens_totalCounterprovider, modelInput tokens processed
moltis_llm_output_tokens_totalCounterprovider, modelOutput tokens generated
moltis_llm_completion_errors_totalCounterprovider, model, error_typeCompletion failures
moltis_llm_time_to_first_token_secondsHistogramprovider, modelStreaming TTFT

Provider Aliases

When you have multiple instances of the same provider type (e.g., separate API keys for work and personal use), you can use the alias configuration option to differentiate them in metrics:

[providers.anthropic]
api_key = "sk-work-..."
alias = "anthropic-work"

# Note: You would need separate config sections for multiple instances
# of the same provider. This is a placeholder for future functionality.

The alias appears in the provider label of all LLM metrics:

moltis_llm_input_tokens_total{provider="anthropic-work", model="claude-3-opus"} 5000
moltis_llm_input_tokens_total{provider="anthropic-personal", model="claude-3-opus"} 3000

This allows you to:

  • Track token usage separately for billing purposes
  • Create separate Grafana dashboards per provider instance
  • Monitor rate limits and quotas independently

MCP (Model Context Protocol) Metrics

MetricTypeLabelsDescription
moltis_mcp_tool_calls_totalCounterserver, toolTool invocations
moltis_mcp_tool_call_duration_secondsHistogramserver, toolTool call latency
moltis_mcp_tool_call_errors_totalCounterserver, tool, error_typeTool call failures
moltis_mcp_servers_connectedGaugeActive MCP server connections

Tool Execution Metrics

MetricTypeLabelsDescription
moltis_tool_executions_totalCountertoolTool executions
moltis_tool_execution_duration_secondsHistogramtoolExecution time
moltis_sandbox_command_executions_totalCounterSandbox commands run

Session Metrics

MetricTypeLabelsDescription
moltis_sessions_created_totalCounterSessions created
moltis_sessions_activeGaugeCurrently active sessions
moltis_session_messages_totalCounterroleMessages by role

Cron Job Metrics

MetricTypeLabelsDescription
moltis_cron_jobs_scheduledGaugeNumber of scheduled jobs
moltis_cron_executions_totalCounterJob executions
moltis_cron_execution_duration_secondsHistogramJob duration
moltis_cron_errors_totalCounterFailed jobs
moltis_cron_stuck_jobs_cleared_totalCounterJobs exceeding 2h timeout
moltis_cron_input_tokens_totalCounterInput tokens from cron runs
moltis_cron_output_tokens_totalCounterOutput tokens from cron runs

Memory/Search Metrics

MetricTypeLabelsDescription
moltis_memory_searches_totalCountersearch_typeSearches performed
moltis_memory_search_duration_secondsHistogramsearch_typeSearch latency
moltis_memory_embeddings_generated_totalCounterproviderEmbeddings created

Channel Metrics

MetricTypeLabelsDescription
moltis_channels_activeGaugeLoaded channel plugins
moltis_channel_messages_received_totalCounterchannelInbound messages
moltis_channel_messages_sent_totalCounterchannelOutbound messages

Telegram-Specific Metrics

MetricTypeLabelsDescription
moltis_telegram_messages_received_totalCounterMessages from Telegram
moltis_telegram_access_control_denials_totalCounterAccess denied events
moltis_telegram_polling_duration_secondsHistogramMessage handling time

OAuth Metrics

MetricTypeLabelsDescription
moltis_oauth_flow_starts_totalCounterOAuth flows initiated
moltis_oauth_flow_completions_totalCounterSuccessful completions
moltis_oauth_token_refresh_totalCounterToken refreshes
moltis_oauth_token_refresh_failures_totalCounterRefresh failures

Skills Metrics

MetricTypeLabelsDescription
moltis_skills_installation_attempts_totalCounterInstallation attempts
moltis_skills_installation_duration_secondsHistogramInstallation time
moltis_skills_git_clone_totalCounterSuccessful git clones
moltis_skills_git_clone_fallback_totalCounterFallbacks to HTTP tarball

Tracing Integration

The moltis-metrics crate includes optional tracing integration via the tracing feature. This allows span context to propagate to metric labels.

Enabling Tracing

moltis-metrics = { version = "0.1", features = ["prometheus", "tracing"] }

Initialization

use moltis_metrics::tracing_integration::init_tracing;

fn main() {
    // Initialize tracing with metrics context propagation
    init_tracing();

    // Now spans will add labels to metrics
}

How It Works

When tracing is enabled, span fields are automatically added as metric labels:

#![allow(unused)]
fn main() {
use tracing::instrument;

#[instrument(fields(operation = "fetch_user", component = "api"))]
async fn fetch_user(id: u64) -> User {
    // Metrics recorded here will include:
    // - operation="fetch_user"
    // - component="api"
    counter!("api_calls_total").increment(1);
}
}

Span Labels

The following span fields are propagated to metrics:

FieldDescription
operationThe operation being performed
componentThe component/module name
span.nameThe span’s target/name

Adding Custom Metrics

In Your Code

Use the metrics macros re-exported from moltis-metrics:

#![allow(unused)]
fn main() {
use moltis_metrics::{counter, gauge, histogram, labels};

// Simple counter
counter!("my_custom_requests_total").increment(1);

// Counter with labels
counter!(
    "my_custom_requests_total",
    labels::ENDPOINT => "/api/users",
    labels::METHOD => "GET"
).increment(1);

// Gauge (current value)
gauge!("my_queue_size").set(42.0);

// Histogram (distribution)
histogram!("my_operation_duration_seconds").record(0.123);
}

Feature-Gating

Always gate metrics code to avoid overhead when disabled:

#![allow(unused)]
fn main() {
#[cfg(feature = "metrics")]
use moltis_metrics::{counter, histogram};

pub async fn my_function() {
    #[cfg(feature = "metrics")]
    let start = std::time::Instant::now();

    // ... do work ...

    #[cfg(feature = "metrics")]
    {
        counter!("my_operations_total").increment(1);
        histogram!("my_operation_duration_seconds")
            .record(start.elapsed().as_secs_f64());
    }
}
}

Adding New Metric Definitions

For consistency, add metric name constants to crates/metrics/src/definitions.rs:

#![allow(unused)]
fn main() {
/// My feature metrics
pub mod my_feature {
    /// Total operations performed
    pub const OPERATIONS_TOTAL: &str = "moltis_my_feature_operations_total";
    /// Operation duration in seconds
    pub const OPERATION_DURATION_SECONDS: &str = "moltis_my_feature_operation_duration_seconds";
}
}

Then use them:

#![allow(unused)]
fn main() {
use moltis_metrics::{counter, my_feature};

counter!(my_feature::OPERATIONS_TOTAL).increment(1);
}

Configuration

Metrics configuration in moltis.toml:

[metrics]
enabled = true              # Enable metrics collection (default: true)
prometheus_endpoint = true  # Expose /metrics endpoint (default: true)
labels = { env = "prod" }   # Add custom labels to all metrics

Environment variables:

  • RUST_LOG=moltis_metrics=debug — Enable debug logging for metrics initialization

Best Practices

  1. Use consistent naming: Follow the pattern moltis_<subsystem>_<metric>_<unit>
  2. Add units to names: _total for counters, _seconds for durations, _bytes for sizes
  3. Keep cardinality low: Avoid high-cardinality labels (like user IDs or request IDs)
  4. Feature-gate everything: Use #[cfg(feature = "metrics")] to ensure zero overhead when disabled
  5. Use predefined buckets: The buckets module has standard histogram buckets for common metric types

Troubleshooting

Metrics not appearing

  1. Verify the metrics feature is enabled at compile time
  2. Check that the metrics recorder is initialized (happens automatically in gateway)
  3. Ensure you’re hitting the correct /metrics endpoint
  4. Check moltis.toml has [metrics] enabled = true

Prometheus endpoint not available

  1. Ensure the prometheus feature is enabled (it’s separate from metrics)
  2. Check your build: cargo build --features prometheus

High memory usage

  • Check for high-cardinality labels (many unique label combinations)
  • Consider reducing histogram bucket counts

Missing labels

  • Ensure labels are passed consistently across all metric recordings
  • Check that tracing spans include the expected fields

Tool Registry

The tool registry manages all tools available to the agent during a conversation. It tracks where each tool comes from and supports filtering by source.

Tool Sources

Every registered tool has a ToolSource that identifies its origin:

  • Builtin — tools shipped with the binary (exec, web_fetch, etc.)
  • Mcp { server } — tools provided by an MCP server, tagged with the server name

This replaces the previous convention of identifying MCP tools by their mcp__ name prefix, providing type-safe filtering instead of string matching.

Registration

#![allow(unused)]
fn main() {
// Built-in tool
registry.register(Box::new(MyTool::new()));

// MCP tool — tagged with server name
registry.register_mcp(Box::new(adapter), "github".to_string());
}

Filtering

When MCP tools are disabled for a session, the registry can produce a filtered copy:

#![allow(unused)]
fn main() {
// Type-safe: filters by ToolSource::Mcp variant
let no_mcp = registry.clone_without_mcp();

// Remove all MCP tools in-place (used during sync)
let removed_count = registry.unregister_mcp();
}

Schema Output

list_schemas() includes source metadata in every tool schema:

{
  "name": "exec",
  "description": "Execute a command",
  "parameters": { ... },
  "source": "builtin"
}
{
  "name": "mcp__github__search",
  "description": "Search GitHub",
  "parameters": { ... },
  "source": "mcp",
  "mcpServer": "github"
}

The source and mcpServer fields are available to the UI for rendering tools grouped by origin.

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.1.4] - 2026-02-06

Added

  • Config Check Command: moltis config check validates the configuration file, detects unknown/misspelled fields with Levenshtein-based suggestions, warns about security misconfigurations, and checks file references

  • Memory Usage Indicator: Display process RSS and system free memory in the header bar, updated every 30 seconds via the tick WebSocket broadcast

  • QMD Backend Support: Optional QMD (Query Memory Daemon) backend for hybrid search with BM25 + vector + LLM reranking

    • Gated behind qmd feature flag (enabled by default)
    • Web UI shows installation instructions and QMD status
    • Comparison table between built-in SQLite and QMD backends
  • Citations: Configurable citation mode (on/off/auto) for memory search results

    • Auto mode includes citations when results span multiple files
  • Session Export: Option to export session transcripts to memory for future reference

  • LLM Reranking: Use LLM to rerank search results for improved relevance (requires QMD)

  • Memory Documentation: Added docs/src/memory.md with comprehensive memory system documentation

  • Mobile PWA Support: Install moltis as a Progressive Web App on iOS, Android, and desktop

    • Standalone mode with full-screen experience
    • Custom app icon (crab mascot)
    • Service worker for offline support and caching
    • Safe area support for notched devices
  • Push Notifications: Receive alerts when the LLM responds

    • VAPID key generation and storage for Web Push API
    • Subscribe/unsubscribe toggle in Settings > Notifications
    • Subscription management UI showing device name, IP address, and date
    • Remove any subscription from any device
    • Real-time subscription updates via WebSocket
    • Client IP detection from X-Forwarded-For, X-Real-IP, CF-Connecting-IP headers
    • Notifications sent for both streaming and agent (tool-using) chat modes
  • Safari/iOS PWA Detection: Show “Add to Dock” instructions when push notifications require PWA installation (Safari doesn’t support push in browser mode)

  • Session state store: per-session key-value persistence scoped by namespace, backed by SQLite (session_state tool).

  • Session branching: branch_session tool forks a conversation at any message index into an independent copy.

  • Session fork from UI: Fork button in the chat header and sidebar action buttons let users fork sessions without asking the LLM. Forked sessions appear indented under their parent with a branch icon.

  • Skill self-extension: create_skill, update_skill, delete_skill tools let the agent manage project-local skills at runtime.

  • Skill hot-reload: filesystem watcher on skill directories emits skills.changed events via WebSocket when SKILL.md files change.

  • Typed tool sources: ToolSource enum (Builtin / Mcp { server }) replaces string-prefix identification of MCP tools in the tool registry.

  • Tool registry metadata: list_schemas() now includes source and mcpServer fields so the UI can group tools by origin.

  • Per-session MCP toggle: sessions store an mcp_disabled flag; the chat header exposes a toggle button to enable/disable MCP tools per session.

  • Debug panel convergence: the debug side-panel now renders the same seven sections as the /context slash command, eliminating duplicated rendering logic.

  • Documentation pages for session state, session branching, skill self-extension, and the tool registry architecture.

Changed

  • Memory settings UI enhanced with backend comparison and feature explanations

  • Added memory.qmd.status RPC method for checking QMD availability

  • Extended memory.config.get to include qmd_feature_enabled flag

  • Push notifications feature is now enabled by default in the CLI

  • TLS HTTP redirect port now defaults to gateway_port + 1 instead of the hardcoded port 18790. This makes the Dockerfile simpler (both ports are adjacent) and avoids collisions when running multiple instances. Override via [tls] http_redirect_port in moltis.toml or the MOLTIS_TLS__HTTP_REDIRECT_PORT environment variable.

  • TLS certificates use moltis.localhost domain. Auto-generated server certs now include moltis.localhost, *.moltis.localhost, localhost, 127.0.0.1, and ::1 as SANs. Banner and redirect URLs use https://moltis.localhost:<port> when bound to loopback, so the cert matches the displayed URL. Existing certs are automatically regenerated on next startup.

  • Certificate validity uses dynamic dates. Cert notBefore/notAfter are now computed from the current system time instead of being hardcoded. CA certs are valid for 10 years, server certs for 1 year from generation.

  • McpToolBridge now stores and exposes server_name() for typed registration.

  • mcp_service::sync_mcp_tools() uses unregister_mcp() / register_mcp() instead of scanning tool names by prefix.

  • chat.rs uses clone_without_mcp() instead of clone_without_prefix("mcp__") in all three call sites.

Fixed

  • Push notifications not sending when chat uses agent mode (run_with_tools)
  • Missing space in Safari install instructions (“usingFile” → “using File”)
  • WebSocket origin validation now treats .localhost subdomains (e.g. moltis.localhost) as loopback equivalents per RFC 6761.
  • Fork/branch icon in session sidebar now renders cleanly at 16px (replaced complex git-branch SVG with simple trunk+branch path).
  • Deleting a forked session now navigates to the parent session instead of an unrelated sibling.
  • Streaming tool calls for non-Anthropic providers: OpenAiProvider, GitHubCopilotProvider, KimiCodeProvider, OpenAiCodexProvider, and ProviderChain now implement stream_with_tools() so tool schemas are sent in the streaming API request and tool-call events are properly parsed. Previously only AnthropicProvider supported streaming tool calls; all other providers silently dropped the tools parameter, causing the LLM to emit tool invocations as plain text instead of structured function calls.
  • Streaming tool call arguments dropped when index ≠ 0: When a provider (e.g. GitHub Copilot proxying Claude) emits a text content block at streaming index 0 and a tool_use block at index 1, the runner’s argument finalization used the streaming index as the vector position directly. Since tool_calls has only 1 element at position 0, the condition 1 < 1 was false and arguments were silently dropped (empty {}). Fixed by mapping streaming indices to vector positions via a HashMap.
  • Skill tools wrote to wrong directory: create_skill, update_skill, and delete_skill used std::env::current_dir() captured at gateway startup, writing skills to <cwd>/.moltis/skills/ instead of ~/.moltis/skills/. Skills now write to <data_dir>/skills/ (Personal source), which is always discovered regardless of where the gateway was started.
  • Skills page missing personal/project skills: The /api/skills endpoint only returned manifest-based registry skills. Personal and project-local skills were never shown in the navigation or skills page. The endpoint now discovers and includes them alongside registry skills.

Documentation

  • Added mobile-pwa.md with PWA installation and push notification documentation
  • Updated CLAUDE.md with cargo feature policy (features enabled by default)
  • Rewrote session-branching.md with accurate fork details, UI methods, RPC API, inheritance table, and deletion behavior.