Skip to main content

Performance Dashboard

The AICOS Performance Dashboard provides real-time visibility into execution costs, response times, tool usage, and quality metrics. Use it to monitor the impact of execution profile optimization, identify anomalies, and make data-driven configuration decisions.

Accessing the Performance Dashboard

  1. Sign in to Control Bridge
  2. Navigate to Monitor > Dashboard > Boardroom
  3. Click AICOS Settings in the sidebar
  4. Select the Performance tab

The Performance tab sits alongside the existing Configuration tab in AICOS Settings, so you can see the impact of settings changes immediately.

Summary Cards

At the top of the Performance tab, four summary cards provide an at-a-glance health overview:

CardDescriptionExample
Total ExecutionsNumber of AICOS executions in the selected period342 (+12% vs prior period)
Average Cost/ExecutionMean cost in USD per execution$0.038 (-24% vs prior period)
Chat p95 Latency95th percentile response time for quick chat2.8s (target: < 5s)
Cache Hit RateAverage prompt cache hit rate across all profiles71% (target: > 60%)

Each card shows the current value plus a percentage change compared to the prior equivalent period, helping you identify trends.

Response Time by Conversation Type

A line chart showing response times for each execution profile over the selected time period.

  • X-axis: Time (hourly buckets by default)
  • Y-axis: Response time in milliseconds
  • Series: One line per profile category (long_processing, email_research, quick_chat, reactive_event)
  • Metrics: p50 (solid line) and p95 (dashed line) for each profile

Reading the Chart

ProfileHealthy p95Warning p95Critical p95
Quick Chat< 5 seconds5-15 seconds> 15 seconds
Reactive Event< 30 seconds30-60 seconds> 60 seconds
Email Research< 2 minutes2-5 minutes> 5 minutes
Long Processing< 10 minutes10-15 minutes> 15 minutes

Hover over any data point to see exact values. Click a time window to drill into executions from that period.

tip

If the quick chat p95 line consistently exceeds 5 seconds, consider configuring a dedicated chat model optimized for low latency. See Settings & Customization for model configuration.

Token Cost Breakdown

A pie chart showing how AICOS costs are distributed across execution profiles.

  • Segments: One per profile category
  • Values: Total estimated cost in USD for the selected period
  • Labels: Profile name, percentage of total cost, absolute cost
  • Tooltip: Execution count, average cost per execution, total tokens consumed

Typical Distribution

For most organizations, cost distribution follows a predictable pattern:

ProfileExpected Cost ShareReason
Long Processing50-70%Longest runs, most tool calls, deepest reasoning
Email Research15-30%Medium-length runs with knowledge retrieval
Quick Chat5-15%Short runs with compact prompts
Reactive Event5-10%Brief event processing

If quick chat exceeds 20% of total cost, it may indicate that chat messages are being handled by the wrong profile or that the tool call limit is too high for conversational interactions.

Tool Usage Heatmap

A grid visualization showing which tools are used most frequently across each profile.

  • Rows: Tool names (top 20 by frequency)
  • Columns: Profile categories
  • Color: Intensity based on call count (white = unused, deep blue = most used)
  • Click: Drill into any cell to see average execution time, failure rate, and result size for that tool in that profile

What to Look For

PatternMeaningAction
A tool is hot in quick_chat but cold in long_processingChat is relying heavily on a specific toolVerify this tool is in the chat tool subset
A tool shows high failure rate (visible on drill-down)Tool may have a configuration issueCheck the tool's service connectivity
Many tools are hot across all profilesNo profile specializationReview tool subset configuration
A tool expected in email_research shows zero usageTool may be excluded from the profile subsetCheck the email_research tool list

Cache Efficiency Over Time

An area chart showing prompt cache performance.

  • X-axis: Time (hourly buckets by default)
  • Y-axis: Percentage (0-100%)
  • Areas: Cache hit rate stacked with cache miss rate
  • Overlay: Dotted line at the 60% healthy threshold

Understanding Cache Performance

Prompt caching is most effective during long processing runs, where the system prompt remains stable across many LLM calls within a single execution. High cache hit rates (> 60%) indicate that prompt caching is working well and reducing token costs.

Cache Hit RateStatusMeaning
> 60%HealthyCaching is working effectively
30-60%WarningCaching is partially effective, may be improvable
< 30%Attention neededCaching is not delivering expected savings

Low cache hit rates may indicate that system prompts are changing frequently between LLM calls, or that the LLM provider's cache TTL is too short for the interval between calls.

Health Metrics

The Performance dashboard includes a health status indicator based on key operational metrics. The overall status is derived from the worst-performing metric.

Health Status Levels

StatusMeaning
HealthyAll metrics are within normal ranges
WarningOne or more metrics are in warning range - monitor closely
CriticalOne or more metrics are in critical range - investigate immediately

Metric Thresholds

MetricHealthyWarningCritical
Chat response time (p95)< 5s5-15s> 15s
Long run compaction count0-23-5> 5
System prompt ratio< 40%40-60%> 60%
Cache hit rate> 60%30-60%< 30%
Tool call failure rate< 2%2-10%> 10%
Fallback activation rate< 1%1-5%> 5%
Context exhaustion rate< 2%2-10%> 10%
Execution timeout rate< 1%1-5%> 5%

Understanding Key Metrics

System prompt ratio measures what percentage of the context window is consumed by the system prompt. A high ratio (> 40%) means less room for conversation history and tool results. The compact prompt tier used by quick chat is designed to keep this ratio low.

Compaction count tracks how many times the conversation history had to be summarized during a single execution. Frequent compaction in long processing runs is expected; frequent compaction in email research or reactive events may indicate that tool results are larger than expected.

Fallback activation rate measures how often AICOS falls back from the primary model to the fallback model due to errors or unavailability. A non-zero rate is expected during provider outages, but a persistent rate above 1% may indicate API key or quota issues.

Context exhaustion rate measures how often long processing runs fail because the context window fills up despite compaction. The v2 progressive compaction strategy aims to keep this below 2%.

Quality Feedback Ratings

The Performance dashboard includes a quality feedback section that shows Business Owner satisfaction with AICOS responses.

How Ratings Work

When the Business Owner views AICOS chat responses, they can rate each message with a thumbs up or thumbs down. These ratings are correlated with execution context (profile, model, tool calls, duration) to identify which configurations produce the best outcomes.

Rating Metrics Displayed

MetricDescription
Overall satisfactionPercentage of positively-rated messages
Satisfaction by profileBreakdown by execution profile type
Satisfaction by modelBreakdown by LLM model used
TrendSatisfaction percentage over time
info

Quality feedback is optional and does not affect AICOS behavior. It provides data to help administrators make informed configuration decisions. The feedback system does not automatically adjust settings.

Filtering and Time Ranges

Time Period Selection

PeriodDescriptionDefault Granularity
24 hoursMost recent day15-minute buckets
7 daysLast week (default)1-hour buckets
30 daysLast month1-day buckets

Select the time period from the dropdown at the top of the Performance tab. All charts and summary cards update to reflect the selected period.

Data Source

Performance metrics are collected from execution metadata written during every AICOS execution. Each execution records its profile category, trigger type, model used, token counts, cache statistics, tool call counts, duration, and estimated cost. This data is aggregated server-side and returned as pre-computed chart data.

info

Performance data has a retention period of 90 days. Historical data beyond 90 days is not available in the dashboard. For long-term trend analysis, use the Daily Activity Summary email reports.

Real-Time Progress Indicators

During active AICOS executions, the Performance dashboard and the AICOS chat interface show real-time progress via SignalR events.

Tool Progress

For long-running executions (daily wake-ups, email research), a progress indicator shows:

  • Current tool number out of the profile's maximum (e.g., "Working... tool 23/100")
  • Name of the tool currently executing
  • Duration so far

Streaming Chat Responses

When the quick chat profile is active and streaming is enabled, chat responses appear token-by-token in the chat interface, similar to how other AI chat applications deliver responses. This dramatically improves perceived latency, as users see the first tokens within 1-2 seconds rather than waiting for the complete response.

Execution Summaries

At the end of every execution, a summary event is broadcast containing:

  • Profile used
  • Total duration
  • Tool calls made
  • Tokens consumed
  • Estimated cost
  • Outcome (completed, timeout, error, budget exceeded)

The AICOS Dashboard auto-refreshes its summary cards when this event arrives.

Troubleshooting

No Data in Performance Dashboard

  1. Verify that AICOS has executed at least once in the selected time period
  2. Check that the AICOS agent status is active
  3. Ensure your account has the caioo:read permission
  4. Try selecting a longer time period (30 days)

Unexpected Cost Spikes

  1. Check the Token Cost Breakdown for which profile category is responsible
  2. Review the Tool Usage Heatmap for unusually high tool call counts
  3. Check whether a model change increased per-token pricing
  4. Look for an increase in execution frequency (more daily wake-ups, more emails)

High Fallback Activation Rate

  1. Verify that the primary model's API key is valid and has available quota
  2. Check the LLM provider's status page for outages
  3. Review execution logs for specific error messages
  4. Consider switching to a different primary model if the issue persists

Cache Hit Rate Declining

  1. Check if system prompt customizations were recently changed (custom prompts reduce cache effectiveness)
  2. Verify that the LLM provider supports prompt caching for the configured model
  3. Review whether execution frequency has decreased (lower frequency means more cache expiration)