Performance Dashboard
The AICOS Performance Dashboard provides real-time visibility into execution costs, response times, tool usage, and quality metrics. Use it to monitor the impact of execution profile optimization, identify anomalies, and make data-driven configuration decisions.
Accessing the Performance Dashboard
- Sign in to Control Bridge
- Navigate to Monitor > Dashboard > Boardroom
- Click AICOS Settings in the sidebar
- Select the Performance tab
The Performance tab sits alongside the existing Configuration tab in AICOS Settings, so you can see the impact of settings changes immediately.
Summary Cards
At the top of the Performance tab, four summary cards provide an at-a-glance health overview:
| Card | Description | Example |
|---|---|---|
| Total Executions | Number of AICOS executions in the selected period | 342 (+12% vs prior period) |
| Average Cost/Execution | Mean cost in USD per execution | $0.038 (-24% vs prior period) |
| Chat p95 Latency | 95th percentile response time for quick chat | 2.8s (target: < 5s) |
| Cache Hit Rate | Average prompt cache hit rate across all profiles | 71% (target: > 60%) |
Each card shows the current value plus a percentage change compared to the prior equivalent period, helping you identify trends.
Response Time by Conversation Type
A line chart showing response times for each execution profile over the selected time period.
- X-axis: Time (hourly buckets by default)
- Y-axis: Response time in milliseconds
- Series: One line per profile category (long_processing, email_research, quick_chat, reactive_event)
- Metrics: p50 (solid line) and p95 (dashed line) for each profile
Reading the Chart
| Profile | Healthy p95 | Warning p95 | Critical p95 |
|---|---|---|---|
| Quick Chat | < 5 seconds | 5-15 seconds | > 15 seconds |
| Reactive Event | < 30 seconds | 30-60 seconds | > 60 seconds |
| Email Research | < 2 minutes | 2-5 minutes | > 5 minutes |
| Long Processing | < 10 minutes | 10-15 minutes | > 15 minutes |
Hover over any data point to see exact values. Click a time window to drill into executions from that period.
If the quick chat p95 line consistently exceeds 5 seconds, consider configuring a dedicated chat model optimized for low latency. See Settings & Customization for model configuration.
Token Cost Breakdown
A pie chart showing how AICOS costs are distributed across execution profiles.
- Segments: One per profile category
- Values: Total estimated cost in USD for the selected period
- Labels: Profile name, percentage of total cost, absolute cost
- Tooltip: Execution count, average cost per execution, total tokens consumed
Typical Distribution
For most organizations, cost distribution follows a predictable pattern:
| Profile | Expected Cost Share | Reason |
|---|---|---|
| Long Processing | 50-70% | Longest runs, most tool calls, deepest reasoning |
| Email Research | 15-30% | Medium-length runs with knowledge retrieval |
| Quick Chat | 5-15% | Short runs with compact prompts |
| Reactive Event | 5-10% | Brief event processing |
If quick chat exceeds 20% of total cost, it may indicate that chat messages are being handled by the wrong profile or that the tool call limit is too high for conversational interactions.
Tool Usage Heatmap
A grid visualization showing which tools are used most frequently across each profile.
- Rows: Tool names (top 20 by frequency)
- Columns: Profile categories
- Color: Intensity based on call count (white = unused, deep blue = most used)
- Click: Drill into any cell to see average execution time, failure rate, and result size for that tool in that profile
What to Look For
| Pattern | Meaning | Action |
|---|---|---|
| A tool is hot in quick_chat but cold in long_processing | Chat is relying heavily on a specific tool | Verify this tool is in the chat tool subset |
| A tool shows high failure rate (visible on drill-down) | Tool may have a configuration issue | Check the tool's service connectivity |
| Many tools are hot across all profiles | No profile specialization | Review tool subset configuration |
| A tool expected in email_research shows zero usage | Tool may be excluded from the profile subset | Check the email_research tool list |
Cache Efficiency Over Time
An area chart showing prompt cache performance.
- X-axis: Time (hourly buckets by default)
- Y-axis: Percentage (0-100%)
- Areas: Cache hit rate stacked with cache miss rate
- Overlay: Dotted line at the 60% healthy threshold
Understanding Cache Performance
Prompt caching is most effective during long processing runs, where the system prompt remains stable across many LLM calls within a single execution. High cache hit rates (> 60%) indicate that prompt caching is working well and reducing token costs.
| Cache Hit Rate | Status | Meaning |
|---|---|---|
| > 60% | Healthy | Caching is working effectively |
| 30-60% | Warning | Caching is partially effective, may be improvable |
| < 30% | Attention needed | Caching is not delivering expected savings |
Low cache hit rates may indicate that system prompts are changing frequently between LLM calls, or that the LLM provider's cache TTL is too short for the interval between calls.
Health Metrics
The Performance dashboard includes a health status indicator based on key operational metrics. The overall status is derived from the worst-performing metric.
Health Status Levels
| Status | Meaning |
|---|---|
| Healthy | All metrics are within normal ranges |
| Warning | One or more metrics are in warning range - monitor closely |
| Critical | One or more metrics are in critical range - investigate immediately |
Metric Thresholds
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Chat response time (p95) | < 5s | 5-15s | > 15s |
| Long run compaction count | 0-2 | 3-5 | > 5 |
| System prompt ratio | < 40% | 40-60% | > 60% |
| Cache hit rate | > 60% | 30-60% | < 30% |
| Tool call failure rate | < 2% | 2-10% | > 10% |
| Fallback activation rate | < 1% | 1-5% | > 5% |
| Context exhaustion rate | < 2% | 2-10% | > 10% |
| Execution timeout rate | < 1% | 1-5% | > 5% |
Understanding Key Metrics
System prompt ratio measures what percentage of the context window is consumed by the system prompt. A high ratio (> 40%) means less room for conversation history and tool results. The compact prompt tier used by quick chat is designed to keep this ratio low.
Compaction count tracks how many times the conversation history had to be summarized during a single execution. Frequent compaction in long processing runs is expected; frequent compaction in email research or reactive events may indicate that tool results are larger than expected.
Fallback activation rate measures how often AICOS falls back from the primary model to the fallback model due to errors or unavailability. A non-zero rate is expected during provider outages, but a persistent rate above 1% may indicate API key or quota issues.
Context exhaustion rate measures how often long processing runs fail because the context window fills up despite compaction. The v2 progressive compaction strategy aims to keep this below 2%.
Quality Feedback Ratings
The Performance dashboard includes a quality feedback section that shows Business Owner satisfaction with AICOS responses.
How Ratings Work
When the Business Owner views AICOS chat responses, they can rate each message with a thumbs up or thumbs down. These ratings are correlated with execution context (profile, model, tool calls, duration) to identify which configurations produce the best outcomes.
Rating Metrics Displayed
| Metric | Description |
|---|---|
| Overall satisfaction | Percentage of positively-rated messages |
| Satisfaction by profile | Breakdown by execution profile type |
| Satisfaction by model | Breakdown by LLM model used |
| Trend | Satisfaction percentage over time |
Quality feedback is optional and does not affect AICOS behavior. It provides data to help administrators make informed configuration decisions. The feedback system does not automatically adjust settings.
Filtering and Time Ranges
Time Period Selection
| Period | Description | Default Granularity |
|---|---|---|
| 24 hours | Most recent day | 15-minute buckets |
| 7 days | Last week (default) | 1-hour buckets |
| 30 days | Last month | 1-day buckets |
Select the time period from the dropdown at the top of the Performance tab. All charts and summary cards update to reflect the selected period.
Data Source
Performance metrics are collected from execution metadata written during every AICOS execution. Each execution records its profile category, trigger type, model used, token counts, cache statistics, tool call counts, duration, and estimated cost. This data is aggregated server-side and returned as pre-computed chart data.
Performance data has a retention period of 90 days. Historical data beyond 90 days is not available in the dashboard. For long-term trend analysis, use the Daily Activity Summary email reports.
Real-Time Progress Indicators
During active AICOS executions, the Performance dashboard and the AICOS chat interface show real-time progress via SignalR events.
Tool Progress
For long-running executions (daily wake-ups, email research), a progress indicator shows:
- Current tool number out of the profile's maximum (e.g., "Working... tool 23/100")
- Name of the tool currently executing
- Duration so far
Streaming Chat Responses
When the quick chat profile is active and streaming is enabled, chat responses appear token-by-token in the chat interface, similar to how other AI chat applications deliver responses. This dramatically improves perceived latency, as users see the first tokens within 1-2 seconds rather than waiting for the complete response.
Execution Summaries
At the end of every execution, a summary event is broadcast containing:
- Profile used
- Total duration
- Tool calls made
- Tokens consumed
- Estimated cost
- Outcome (completed, timeout, error, budget exceeded)
The AICOS Dashboard auto-refreshes its summary cards when this event arrives.
Troubleshooting
No Data in Performance Dashboard
- Verify that AICOS has executed at least once in the selected time period
- Check that the AICOS agent status is
active - Ensure your account has the
caioo:readpermission - Try selecting a longer time period (30 days)
Unexpected Cost Spikes
- Check the Token Cost Breakdown for which profile category is responsible
- Review the Tool Usage Heatmap for unusually high tool call counts
- Check whether a model change increased per-token pricing
- Look for an increase in execution frequency (more daily wake-ups, more emails)
High Fallback Activation Rate
- Verify that the primary model's API key is valid and has available quota
- Check the LLM provider's status page for outages
- Review execution logs for specific error messages
- Consider switching to a different primary model if the issue persists
Cache Hit Rate Declining
- Check if system prompt customizations were recently changed (custom prompts reduce cache effectiveness)
- Verify that the LLM provider supports prompt caching for the configured model
- Review whether execution frequency has decreased (lower frequency means more cache expiration)
Related Topics
- Execution Profiles - Understanding the four execution profiles
- Settings & Customization - Model selection and budget configuration
- Execution Cycle - How AICOS operates daily
- Usage Dashboard - Organization-wide usage analytics