How to Monitor Your LLM App in Production (Step-by-Step)
You shipped your LLM app. Users are using it. But do you actually know what's happening? Most teams have zero visibility into their AI app once it's live. This guide shows you how to fix that in under 30 minutes.
Why Traditional Monitoring Isn't Enough for LLM Apps
Your existing monitoring stack (Datadog, New Relic, Prometheus) tracks the basics: uptime, HTTP response codes, server CPU. But AI applications need fundamentally different monitoring:
A 200 OK response doesn't mean your AI gave a good answer. You need observability that understands AI-specific signals. Here's how to set it up.
Step 1: Instrument Your LLM Application
The foundation of LLM monitoring is event logging. Every time your app makes an LLM call, you need to capture the input, output, and relevant metadata.
import phospho
import openai
# Initialize phospho (get your API key from the dashboard)
phospho.init(api_key="ph_your_key")
def handle_user_message(user_input, session_id):
# Your existing LLM call
response = openai.chat.completions.create(
model="gpt-4",
messages=[{...}]
)
output = response.choices[0].message.content
# Log to phospho — this is the key line
phospho.log(
input=user_input,
output=output,
session_id=session_id,
metadata={
"model": "gpt-4",
"tokens_in": response.usage.prompt_tokens,
"tokens_out": response.usage.completion_tokens,
}
)
return outputThat's it for instrumentation. The phospho.log() call captures the full interaction with metadata. It's non-blocking, so it won't slow down your app.
Step 2: Set Up Your Monitoring Dashboard
Once events start flowing in, your Phospho dashboard automatically surfaces the key metrics you need:
No configuration needed. The dashboard is ready the moment your first event arrives.
Set up LLM monitoring in 5 minutes
Stop guessing what's happening in your AI app. Get real-time visibility with Phospho.
Get Phospho Pro — $49/moStep 3: Configure Alerts and Thresholds
Proactive monitoring means catching issues before your users report them. Set up alerts for:
| Alert Type | Threshold | Why It Matters |
|---|---|---|
| Quality drop | < 0.7 score (1h avg) | AI is giving worse answers |
| Cost spike | > 2x daily average | Runaway costs or abuse |
| Latency increase | > 5s p95 response time | Users waiting too long |
| Error rate | > 5% of requests | LLM API issues |
Step 4: Analyze Sessions and Improve
The real power of LLM monitoring comes from turning data into action. Here's the workflow teams use daily:
- 1.Review daily metrics — Check the dashboard for quality trends, cost patterns, and volume changes.
- 2.Investigate low-quality sessions — Use session replay to understand exactly where the AI went wrong.
- 3.Identify patterns — Look for common failure modes: specific query types, user segments, or times of day.
- 4.Iterate on prompts — Use insights to improve system prompts, add guardrails, or adjust model selection.
- 5.Verify improvements — Compare quality scores before and after changes to prove impact.
The ROI of LLM Monitoring
Teams using production monitoring for their LLM applications consistently report:
At $49/month, Phospho pays for itself after finding a single cost optimization or catching one quality issue before users do.
Common Mistakes to Avoid
- ✕Only monitoring latency and ignoring response quality
- ✕Waiting for user complaints instead of proactively detecting issues
- ✕Not tracking costs at the per-user and per-feature level
- ✕Ignoring session-level context (individual messages don't tell the full story)
- ✕Using generic APM tools that don't understand LLM-specific signals
Don't wait for users to complain
Get visibility into your LLM application today. Set up monitoring in under 5 minutes and start shipping with confidence.
Get Phospho Pro — $49/mo Early AccessFounding member pricing locked in forever.