Building Observability Into Your Rust AI Agent

Building an AI agent that works in demos is easy. Building one that works in production is a completely different problem. After spending weeks on observability for my own agent (ZeroClaw), here's what actually matters.

The Debugging Nightmare

Most agent demos hide a dirty secret: when something goes wrong, you have no idea what happened. The LLM decided to do something, used a tool, and... then what? Why did it pick that tool? What was in its context? What did the tool return?

Without observability, you're flying blind.

What Actually Matters

After building this twice (once wrong, once right), here's the hierarchy:

1. Structured Tracing (Non-Negotiable)

Don't just log strings. Use structured logging with context:

tracing::info!(
    agent_id = %self.id,
    tool = %tool_name,
    decision_latency_ms = start.elapsed().as_millis(),
    "Agent selected tool"
);

This lets you filter by agent, by tool, by latency. Regex grep on plain text doesn't scale.

2. Event Sourcing

Your agent's state isn't just "what's the current message." It's a sequence of decisions:

Thought: What the agent was considering
Action: What tool it chose
Observation: What the tool returned
Result: Final response

Persist this sequence. When your agent goes off the rails at 3am, you can replay exactly what happened.

3. Spans, Not Just Spaghetti

A single agent request might touch:

Memory retrieval
Context building
LLM API call
Tool selection
Tool execution
Response formatting

Each of these should be a span with proper nesting. When something slow, you want to know what was slow, not just that "the request was slow."

The Rust Ecosystem

For ZeroClaw, I settled on:

tracing for structured logging and spans
tracing-subscriber for output control
SQLite for event persistence (simple, embedded, no separate service)

The key insight: don't over-engineer. A simple events table with timestamp, agent_id, event_type, payload is 80% of what you need.

What I Built

My observability module now tracks:

Every tool call with input/output
Every LLM request with tokens used
Every memory retrieval with what was fetched
Latency percentiles per operation
Error rates by type

The result: when something breaks, I can answer "what happened" in seconds instead of hours.

The Hard Part

The hardest part isn't the instrumentation. It's deciding what to instrument. Instrument everything and you have noise. Instrument too little and you can't debug.

My rule: if you'd want to know it when debugging a 3am incident, log it. If you're only logging it for fun, skip it.

ZeroClaw is my Rust-based agent daemon. This observability work is what let me actually ship it to production.