Metrics and Tracing

Moltis includes comprehensive observability support through Prometheus metrics and tracing integration. This document explains how to enable, configure, and use these features.

Overview

The metrics system is built on the metrics crate facade, which provides a unified interface similar to the log crate. When the prometheus feature is enabled, metrics are exported in Prometheus text format for scraping by Grafana, Prometheus, or other monitoring tools.

All metrics are feature-gated — they add zero overhead when disabled.

Feature Flags

Metrics are controlled by two feature flags:

Feature	Description	Default
`metrics`	Enables metrics collection and the `/api/metrics` JSON API	Enabled
`prometheus`	Enables the `/metrics` Prometheus endpoint (requires `metrics`)	Enabled

Compile-Time Configuration

# Enable only metrics collection (no Prometheus endpoint)
moltis-gateway = { version = "0.1", features = ["metrics"] }

# Enable metrics with Prometheus export (default)
moltis-gateway = { version = "0.1", features = ["metrics", "prometheus"] }

# Enable metrics for specific crates
moltis-agents = { version = "0.1", features = ["metrics"] }
moltis-cron = { version = "0.1", features = ["metrics"] }

To build without metrics entirely:

cargo build --release --no-default-features --features "file-watcher,tailscale,tls,web-ui"

Prometheus Endpoint

When the prometheus feature is enabled, the gateway exposes a /metrics endpoint:

GET http://localhost:18789/metrics

This endpoint is unauthenticated to allow Prometheus scrapers to access it. It returns metrics in Prometheus text format:

# HELP moltis_http_requests_total Total number of HTTP requests handled
# TYPE moltis_http_requests_total counter
moltis_http_requests_total{method="GET",status="200",endpoint="/api/chat"} 42

# HELP moltis_llm_completion_duration_seconds Duration of LLM completion requests
# TYPE moltis_llm_completion_duration_seconds histogram
moltis_llm_completion_duration_seconds_bucket{provider="anthropic",model="claude-3-opus",le="1.0"} 5

Grafana Integration

To scrape metrics with Prometheus and visualize in Grafana:

Add moltis to your prometheus.yml:

scrape_configs:
  - job_name: 'moltis'
    static_configs:
      - targets: ['localhost:18789']
    metrics_path: /metrics
    scrape_interval: 15s

Import or create Grafana dashboards using the moltis_* metrics.

JSON API Endpoints

For the web UI dashboard and programmatic access, authenticated JSON endpoints are available:

Endpoint	Description
`GET /api/metrics`	Full metrics snapshot with aggregates and per-provider breakdown
`GET /api/metrics/summary`	Lightweight counts for navigation badges
`GET /api/metrics/history`	Time-series data points for charts (last hour, 10s intervals)

History Endpoint

The /api/metrics/history endpoint returns historical metrics data for rendering time-series charts:

{
  "enabled": true,
  "interval_seconds": 10,
  "max_points": 60480,
  "points": [
    {
      "timestamp": 1706832000000,
      "llm_completions": 42,
      "llm_input_tokens": 15000,
      "llm_output_tokens": 8000,
      "http_requests": 150,
      "ws_active": 3,
      "tool_executions": 25,
      "mcp_calls": 12,
      "active_sessions": 2
    }
  ]
}

Metrics Persistence

Metrics history is persisted to SQLite, so historical data survives server restarts. The database is stored at ~/.moltis/metrics.db (or the configured data directory).

Key features:

7-day retention: History is kept for 7 days (60,480 data points at 10-second intervals)
Automatic cleanup: Old data is automatically removed hourly
Startup recovery: History is loaded from the database when the server starts

The storage backend uses a trait-based design (MetricsStore), allowing alternative implementations (e.g., TimescaleDB) for larger deployments.

Storage Architecture

#![allow(unused)]
fn main() {
// The MetricsStore trait defines the storage interface
#[async_trait]
pub trait MetricsStore: Send + Sync {
    async fn save_point(&self, point: &MetricsHistoryPoint) -> Result<()>;
    async fn load_history(&self, since: u64, limit: usize) -> Result<Vec<MetricsHistoryPoint>>;
    async fn cleanup_before(&self, before: u64) -> Result<u64>;
    async fn latest_point(&self) -> Result<Option<MetricsHistoryPoint>>;
}
}

The default SqliteMetricsStore implementation stores data in a single table with an index on the timestamp column for efficient range queries.

Web UI Dashboard

The gateway includes a built-in metrics dashboard at /monitoring in the web UI. This page displays:

Overview Tab:

System metrics (uptime, connected clients, active sessions)
LLM usage (completions, tokens, cache statistics)
Tool execution statistics
MCP server status
Provider breakdown table
Prometheus endpoint (with copy button)

Charts Tab:

Token usage over time (input/output)
HTTP requests and LLM completions
WebSocket connections and active sessions
Tool executions and MCP calls

The dashboard uses uPlot for lightweight, high-performance time-series charts. Data updates every 10 seconds for current metrics and every 30 seconds for history.

Available Metrics

HTTP Metrics

Metric	Type	Labels	Description
`moltis_http_requests_total`	Counter	method, status, endpoint	Total HTTP requests
`moltis_http_request_duration_seconds`	Histogram	method, status, endpoint	Request latency
`moltis_http_requests_in_flight`	Gauge	—	Currently processing requests

LLM/Agent Metrics

Metric	Type	Labels	Description
`moltis_llm_completions_total`	Counter	provider, model	Total completions requested
`moltis_llm_completion_duration_seconds`	Histogram	provider, model	Completion latency
`moltis_llm_input_tokens_total`	Counter	provider, model	Input tokens processed
`moltis_llm_output_tokens_total`	Counter	provider, model	Output tokens generated
`moltis_llm_completion_errors_total`	Counter	provider, model, error_type	Completion failures
`moltis_llm_time_to_first_token_seconds`	Histogram	provider, model	Streaming TTFT

Provider Aliases

When you have multiple instances of the same provider type (e.g., separate API keys for work and personal use), you can use the alias configuration option to differentiate them in metrics:

[providers.anthropic]
api_key = "sk-work-..."
alias = "anthropic-work"

# Note: You would need separate config sections for multiple instances
# of the same provider. This is a placeholder for future functionality.

The alias appears in the provider label of all LLM metrics:

moltis_llm_input_tokens_total{provider="anthropic-work", model="claude-3-opus"} 5000
moltis_llm_input_tokens_total{provider="anthropic-personal", model="claude-3-opus"} 3000

This allows you to:

Track token usage separately for billing purposes
Create separate Grafana dashboards per provider instance
Monitor rate limits and quotas independently

MCP (Model Context Protocol) Metrics

Metric	Type	Labels	Description
`moltis_mcp_tool_calls_total`	Counter	server, tool	Tool invocations
`moltis_mcp_tool_call_duration_seconds`	Histogram	server, tool	Tool call latency
`moltis_mcp_tool_call_errors_total`	Counter	server, tool, error_type	Tool call failures
`moltis_mcp_servers_connected`	Gauge	—	Active MCP server connections

Tool Execution Metrics

Metric	Type	Labels	Description
`moltis_tool_executions_total`	Counter	tool	Tool executions
`moltis_tool_execution_duration_seconds`	Histogram	tool	Execution time
`moltis_sandbox_command_executions_total`	Counter	—	Sandbox commands run

Session Metrics

Metric	Type	Labels	Description
`moltis_sessions_created_total`	Counter	—	Sessions created
`moltis_sessions_active`	Gauge	—	Currently active sessions
`moltis_session_messages_total`	Counter	role	Messages by role

Cron Job Metrics

Metric	Type	Labels	Description
`moltis_cron_jobs_scheduled`	Gauge	—	Number of scheduled jobs
`moltis_cron_executions_total`	Counter	—	Job executions
`moltis_cron_execution_duration_seconds`	Histogram	—	Job duration
`moltis_cron_errors_total`	Counter	—	Failed jobs
`moltis_cron_stuck_jobs_cleared_total`	Counter	—	Jobs exceeding 2h timeout
`moltis_cron_input_tokens_total`	Counter	—	Input tokens from cron runs
`moltis_cron_output_tokens_total`	Counter	—	Output tokens from cron runs

Memory/Search Metrics

Metric	Type	Labels	Description
`moltis_memory_searches_total`	Counter	search_type	Searches performed
`moltis_memory_search_duration_seconds`	Histogram	search_type	Search latency
`moltis_memory_embeddings_generated_total`	Counter	provider	Embeddings created

Channel Metrics

Metric	Type	Labels	Description
`moltis_channels_active`	Gauge	—	Loaded channel plugins
`moltis_channel_messages_received_total`	Counter	channel	Inbound messages
`moltis_channel_messages_sent_total`	Counter	channel	Outbound messages

Telegram-Specific Metrics

Metric	Type	Labels	Description
`moltis_telegram_messages_received_total`	Counter	—	Messages from Telegram
`moltis_telegram_access_control_denials_total`	Counter	—	Access denied events
`moltis_telegram_polling_duration_seconds`	Histogram	—	Message handling time

OAuth Metrics

Metric	Type	Labels	Description
`moltis_oauth_flow_starts_total`	Counter	—	OAuth flows initiated
`moltis_oauth_flow_completions_total`	Counter	—	Successful completions
`moltis_oauth_token_refresh_total`	Counter	—	Token refreshes
`moltis_oauth_token_refresh_failures_total`	Counter	—	Refresh failures

Skills Metrics

Metric	Type	Labels	Description
`moltis_skills_installation_attempts_total`	Counter	—	Installation attempts
`moltis_skills_installation_duration_seconds`	Histogram	—	Installation time
`moltis_skills_git_clone_total`	Counter	—	Successful git clones
`moltis_skills_git_clone_fallback_total`	Counter	—	Fallbacks to HTTP tarball

Tracing Integration

The moltis-metrics crate includes optional tracing integration via the tracing feature. This allows span context to propagate to metric labels.

Enabling Tracing

moltis-metrics = { version = "0.1", features = ["prometheus", "tracing"] }

Initialization

use moltis_metrics::tracing_integration::init_tracing;

fn main() {
    // Initialize tracing with metrics context propagation
    init_tracing();

    // Now spans will add labels to metrics
}

How It Works

When tracing is enabled, span fields are automatically added as metric labels:

#![allow(unused)]
fn main() {
use tracing::instrument;

#[instrument(fields(operation = "fetch_user", component = "api"))]
async fn fetch_user(id: u64) -> User {
    // Metrics recorded here will include:
    // - operation="fetch_user"
    // - component="api"
    counter!("api_calls_total").increment(1);
}
}

Span Labels

The following span fields are propagated to metrics:

Field	Description
`operation`	The operation being performed
`component`	The component/module name
`span.name`	The span’s target/name

Adding Custom Metrics

In Your Code

Use the metrics macros re-exported from moltis-metrics:

#![allow(unused)]
fn main() {
use moltis_metrics::{counter, gauge, histogram, labels};

// Simple counter
counter!("my_custom_requests_total").increment(1);

// Counter with labels
counter!(
    "my_custom_requests_total",
    labels::ENDPOINT => "/api/users",
    labels::METHOD => "GET"
).increment(1);

// Gauge (current value)
gauge!("my_queue_size").set(42.0);

// Histogram (distribution)
histogram!("my_operation_duration_seconds").record(0.123);
}

Feature-Gating

Always gate metrics code to avoid overhead when disabled:

#![allow(unused)]
fn main() {
#[cfg(feature = "metrics")]
use moltis_metrics::{counter, histogram};

pub async fn my_function() {
    #[cfg(feature = "metrics")]
    let start = std::time::Instant::now();

    // ... do work ...

    #[cfg(feature = "metrics")]
    {
        counter!("my_operations_total").increment(1);
        histogram!("my_operation_duration_seconds")
            .record(start.elapsed().as_secs_f64());
    }
}
}

Adding New Metric Definitions

For consistency, add metric name constants to crates/metrics/src/definitions.rs:

#![allow(unused)]
fn main() {
/// My feature metrics
pub mod my_feature {
    /// Total operations performed
    pub const OPERATIONS_TOTAL: &str = "moltis_my_feature_operations_total";
    /// Operation duration in seconds
    pub const OPERATION_DURATION_SECONDS: &str = "moltis_my_feature_operation_duration_seconds";
}
}

Then use them:

#![allow(unused)]
fn main() {
use moltis_metrics::{counter, my_feature};

counter!(my_feature::OPERATIONS_TOTAL).increment(1);
}

Configuration

Metrics configuration in moltis.toml:

[metrics]
enabled = true              # Enable metrics collection (default: true)
prometheus_endpoint = true  # Expose /metrics endpoint (default: true)
labels = { env = "prod" }   # Add custom labels to all metrics

Environment variables:

RUST_LOG=moltis_metrics=debug — Enable debug logging for metrics initialization

Best Practices

Use consistent naming: Follow the pattern moltis_<subsystem>_<metric>_<unit>
Add units to names: _total for counters, _seconds for durations, _bytes for sizes
Keep cardinality low: Avoid high-cardinality labels (like user IDs or request IDs)
Feature-gate everything: Use #[cfg(feature = "metrics")] to ensure zero overhead when disabled
Use predefined buckets: The buckets module has standard histogram buckets for common metric types

Troubleshooting

Metrics not appearing

Verify the metrics feature is enabled at compile time
Check that the metrics recorder is initialized (happens automatically in gateway)
Ensure you’re hitting the correct /metrics endpoint
Check moltis.toml has [metrics] enabled = true

Prometheus endpoint not available

Ensure the prometheus feature is enabled (it’s separate from metrics)
Check your build: cargo build --features prometheus

High memory usage

Check for high-cardinality labels (many unique label combinations)
Consider reducing histogram bucket counts

Missing labels

Ensure labels are passed consistently across all metric recordings
Check that tracing spans include the expected fields

Keyboard shortcuts

Moltis Documentation