Performance Dashboard

The AICOS Performance Dashboard provides real-time visibility into execution costs, response times, tool usage, and quality metrics. Use it to monitor the impact of execution profile optimization, identify anomalies, and make data-driven configuration decisions.

Accessing the Performance Dashboard

Sign in to Control Bridge
Navigate to Monitor > Dashboard > Boardroom
Click AICOS Settings in the sidebar
Select the Performance tab

The Performance tab sits alongside the existing Configuration tab in AICOS Settings, so you can see the impact of settings changes immediately.

Summary Cards

At the top of the Performance tab, four summary cards provide an at-a-glance health overview:

Card	Description	Example
Total Executions	Number of AICOS executions in the selected period	342 (+12% vs prior period)
Average Cost/Execution	Mean cost in USD per execution	$0.038 (-24% vs prior period)
Chat p95 Latency	95th percentile response time for quick chat	2.8s (target: < 5s)
Cache Hit Rate	Average prompt cache hit rate across all profiles	71% (target: > 60%)

Each card shows the current value plus a percentage change compared to the prior equivalent period, helping you identify trends.

Response Time by Conversation Type

A line chart showing response times for each execution profile over the selected time period.

X-axis: Time (hourly buckets by default)
Y-axis: Response time in milliseconds
Series: One line per profile category (long_processing, email_research, quick_chat, reactive_event)
Metrics: p50 (solid line) and p95 (dashed line) for each profile

Reading the Chart

Profile	Healthy p95	Warning p95	Critical p95
Quick Chat	< 5 seconds	5-15 seconds	> 15 seconds
Reactive Event	< 30 seconds	30-60 seconds	> 60 seconds
Email Research	< 2 minutes	2-5 minutes	> 5 minutes
Long Processing	< 10 minutes	10-15 minutes	> 15 minutes

Hover over any data point to see exact values. Click a time window to drill into executions from that period.

tip

If the quick chat p95 line consistently exceeds 5 seconds, consider configuring a dedicated chat model optimized for low latency. See Settings & Customization for model configuration.

Token Cost Breakdown

A pie chart showing how AICOS costs are distributed across execution profiles.

Segments: One per profile category
Values: Total estimated cost in USD for the selected period
Labels: Profile name, percentage of total cost, absolute cost
Tooltip: Execution count, average cost per execution, total tokens consumed

Typical Distribution

For most organizations, cost distribution follows a predictable pattern:

Profile	Expected Cost Share	Reason
Long Processing	50-70%	Longest runs, most tool calls, deepest reasoning
Email Research	15-30%	Medium-length runs with knowledge retrieval
Quick Chat	5-15%	Short runs with compact prompts
Reactive Event	5-10%	Brief event processing

If quick chat exceeds 20% of total cost, it may indicate that chat messages are being handled by the wrong profile or that the tool call limit is too high for conversational interactions.

Tool Usage Heatmap

A grid visualization showing which tools are used most frequently across each profile.

Rows: Tool names (top 20 by frequency)
Columns: Profile categories
Color: Intensity based on call count (white = unused, deep blue = most used)
Click: Drill into any cell to see average execution time, failure rate, and result size for that tool in that profile

What to Look For

Pattern	Meaning	Action
A tool is hot in quick_chat but cold in long_processing	Chat is relying heavily on a specific tool	Verify this tool is in the chat tool subset
A tool shows high failure rate (visible on drill-down)	Tool may have a configuration issue	Check the tool's service connectivity
Many tools are hot across all profiles	No profile specialization	Review tool subset configuration
A tool expected in email_research shows zero usage	Tool may be excluded from the profile subset	Check the email_research tool list

Cache Efficiency Over Time

An area chart showing prompt cache performance.

X-axis: Time (hourly buckets by default)
Y-axis: Percentage (0-100%)
Areas: Cache hit rate stacked with cache miss rate
Overlay: Dotted line at the 60% healthy threshold

Understanding Cache Performance

Prompt caching is most effective during long processing runs, where the system prompt remains stable across many LLM calls within a single execution. High cache hit rates (> 60%) indicate that prompt caching is working well and reducing token costs.

Cache Hit Rate	Status	Meaning
> 60%	Healthy	Caching is working effectively
30-60%	Warning	Caching is partially effective, may be improvable
< 30%	Attention needed	Caching is not delivering expected savings

Low cache hit rates may indicate that system prompts are changing frequently between LLM calls, or that the LLM provider's cache TTL is too short for the interval between calls.

Health Metrics

The Performance dashboard includes a health status indicator based on key operational metrics. The overall status is derived from the worst-performing metric.

Health Status Levels

Status	Meaning
Healthy	All metrics are within normal ranges
Warning	One or more metrics are in warning range - monitor closely
Critical	One or more metrics are in critical range - investigate immediately

Metric Thresholds

Metric	Healthy	Warning	Critical
Chat response time (p95)	< 5s	5-15s	> 15s
Long run compaction count	0-2	3-5	> 5
System prompt ratio	< 40%	40-60%	> 60%
Cache hit rate	> 60%	30-60%	< 30%
Tool call failure rate	< 2%	2-10%	> 10%
Fallback activation rate	< 1%	1-5%	> 5%
Context exhaustion rate	< 2%	2-10%	> 10%
Execution timeout rate	< 1%	1-5%	> 5%

Understanding Key Metrics

System prompt ratio measures what percentage of the context window is consumed by the system prompt. A high ratio (> 40%) means less room for conversation history and tool results. The compact prompt tier used by quick chat is designed to keep this ratio low.

Compaction count tracks how many times the conversation history had to be summarized during a single execution. Frequent compaction in long processing runs is expected; frequent compaction in email research or reactive events may indicate that tool results are larger than expected.

Fallback activation rate measures how often AICOS falls back from the primary model to the fallback model due to errors or unavailability. A non-zero rate is expected during provider outages, but a persistent rate above 1% may indicate API key or quota issues.

Context exhaustion rate measures how often long processing runs fail because the context window fills up despite compaction. The v2 progressive compaction strategy aims to keep this below 2%.

Quality Feedback Ratings

The Performance dashboard includes a quality feedback section that shows Business Owner satisfaction with AICOS responses.

How Ratings Work

When the Business Owner views AICOS chat responses, they can rate each message with a thumbs up or thumbs down. These ratings are correlated with execution context (profile, model, tool calls, duration) to identify which configurations produce the best outcomes.

Rating Metrics Displayed

Metric	Description
Overall satisfaction	Percentage of positively-rated messages
Satisfaction by profile	Breakdown by execution profile type
Satisfaction by model	Breakdown by LLM model used
Trend	Satisfaction percentage over time

info

Quality feedback is optional and does not affect AICOS behavior. It provides data to help administrators make informed configuration decisions. The feedback system does not automatically adjust settings.

Filtering and Time Ranges

Time Period Selection

Period	Description	Default Granularity
24 hours	Most recent day	15-minute buckets
7 days	Last week (default)	1-hour buckets
30 days	Last month	1-day buckets

Select the time period from the dropdown at the top of the Performance tab. All charts and summary cards update to reflect the selected period.

Data Source

Performance metrics are collected from execution metadata written during every AICOS execution. Each execution records its profile category, trigger type, model used, token counts, cache statistics, tool call counts, duration, and estimated cost. This data is aggregated server-side and returned as pre-computed chart data.

info

Performance data has a retention period of 90 days. Historical data beyond 90 days is not available in the dashboard. For long-term trend analysis, use the Daily Activity Summary email reports.

Real-Time Progress Indicators

During active AICOS executions, the Performance dashboard and the AICOS chat interface show real-time progress via SignalR events.

Tool Progress

For long-running executions (daily wake-ups, email research), a progress indicator shows:

Current tool number out of the profile's maximum (e.g., "Working... tool 23/100")
Name of the tool currently executing
Duration so far

Streaming Chat Responses

When the quick chat profile is active and streaming is enabled, chat responses appear token-by-token in the chat interface, similar to how other AI chat applications deliver responses. This dramatically improves perceived latency, as users see the first tokens within 1-2 seconds rather than waiting for the complete response.

Execution Summaries

At the end of every execution, a summary event is broadcast containing:

Profile used
Total duration
Tool calls made
Tokens consumed
Estimated cost
Outcome (completed, timeout, error, budget exceeded)

The AICOS Dashboard auto-refreshes its summary cards when this event arrives.

Troubleshooting

No Data in Performance Dashboard

Verify that AICOS has executed at least once in the selected time period
Check that the AICOS agent status is active
Ensure your account has the caioo:read permission
Try selecting a longer time period (30 days)

Unexpected Cost Spikes

Check the Token Cost Breakdown for which profile category is responsible
Review the Tool Usage Heatmap for unusually high tool call counts
Check whether a model change increased per-token pricing
Look for an increase in execution frequency (more daily wake-ups, more emails)

High Fallback Activation Rate

Verify that the primary model's API key is valid and has available quota
Check the LLM provider's status page for outages
Review execution logs for specific error messages
Consider switching to a different primary model if the issue persists

Cache Hit Rate Declining

Check if system prompt customizations were recently changed (custom prompts reduce cache effectiveness)
Verify that the LLM provider supports prompt caching for the configured model
Review whether execution frequency has decreased (lower frequency means more cache expiration)

Execution Profiles - Understanding the four execution profiles
Settings & Customization - Model selection and budget configuration
Execution Cycle - How AICOS operates daily
Usage Dashboard - Organization-wide usage analytics

Accessing the Performance Dashboard​

Summary Cards​

Response Time by Conversation Type​

Reading the Chart​

Token Cost Breakdown​

Typical Distribution​

Tool Usage Heatmap​

What to Look For​

Cache Efficiency Over Time​

Understanding Cache Performance​

Health Metrics​

Health Status Levels​

Metric Thresholds​

Understanding Key Metrics​

Quality Feedback Ratings​

How Ratings Work​

Rating Metrics Displayed​

Filtering and Time Ranges​

Time Period Selection​

Data Source​

Real-Time Progress Indicators​

Tool Progress​

Streaming Chat Responses​

Execution Summaries​

Troubleshooting​

No Data in Performance Dashboard​

Unexpected Cost Spikes​

High Fallback Activation Rate​

Cache Hit Rate Declining​

Related Topics​