Usage Dashboard

Comprehensive cost, performance, and health analytics for all AI agent activity.

Overview

The Usage Dashboard provides deep visibility into how your AI agents are performing, what they're costing, and where potential issues exist. Instead of analyzing individual executions one at a time, the dashboard aggregates data across all agent activity to reveal patterns, trends, and opportunities for optimization.

Use the dashboard to:

Track spending across agents, models, and operations
Monitor execution health and catch degrading performance early
Understand which tools and models are most used
Compare Code Mode vs standard execution metrics
Analyze costs per mailbox for capacity planning
Inspect pre-execution classifier shadow analytics (divergence rate, estimated savings, latency)

The dashboard surfaces insights that would be difficult to spot in raw execution logs, helping you make data-driven decisions about agent configuration, model selection, and resource allocation.

Dashboard Tabs

The Usage Dashboard is organized into the following tabs:

Tab	Purpose
Overview	Cost, health, model usage, and mailbox analytics for all agent activity
Classifier (Shadow)	Pre-execution classifier telemetry while operating in shadow mode
Variants	Per-variant CCS hit rates, nightly generation cost, and cost guardrail status

Use the tab bar at the top of the page to switch between views. The period selector applies to all tabs.

Accessing the Usage Dashboard

Navigate to Manage > Account > Usage to view the dashboard.

Period Selection

All dashboard sections share a unified period selector at the top of the page:

Period	Description
7 Days	Most recent week of activity (default)
30 Days	Last month for trend analysis
90 Days	Quarterly view for long-term patterns

Changing the period updates all sections simultaneously. Use shorter periods for recent debugging and longer periods for strategic planning.

Overview Tab Sections

1. Daily Cost Trend

Track spending patterns over time with visual correlation between costs and execution volume.

Components:

Stacked Area Chart — Shows cost by source type (agents, scans, AICOS) over time
Daily Execution Volume Bars — Overlay showing execution count per day
KPI Summary Cards:
- Total Cost — Sum of all costs in the selected period
- Total Executions — Count of all agent runs
- Avg Daily Cost — Daily spending average
- Avg Cost/Execution — Per-execution cost average
Token Usage Indicators — Request tokens, response tokens, cache read tokens, cache write tokens, and average tokens per execution

What to look for:

Spikes in cost without corresponding execution increases (indicates more expensive operations)
Days with no activity (may indicate configuration issues)
Steady growth trends (for capacity planning)

Example insights:

"Cost doubled on Tuesday despite similar execution volume" → Investigate if agents switched to more expensive models or used more tools
"Weekend costs are near zero" → Adjust scheduled tasks or mailbox scan frequency for off-hours

2. Per-Agent Cost Breakdown

Understand which agents drive the most spending and their performance characteristics.

Components:

Agent Cost Ranking Table with columns:
- Agent Name — Display name of the agent
- Executions — Total runs in the period
- LLM Calls — Number of model API calls made
- Tool Calls — Number of tool invocations
- Avg Duration — Average execution time
- Success Rate — Percentage of successful completions
- Tokens — Total tokens consumed
- Cost — Total spending for this agent
Horizontal Bar Chart — Visual cost distribution across agents
Knowledge Source Scans — Separate summary row for mailbox scanning operations

What to look for:

High-cost agents with low execution counts (expensive per-use cases)
Low success rates (may need instruction improvements)
Long average durations (performance optimization opportunities)

Example insights:

"Sales Assistant has 95% success rate but Finance Agent is at 65%" → Review Finance Agent instructions and tool availability
"Customer Support Agent accounts for 60% of costs" → Consider using a smaller model or optimizing prompts

Note on success rate: Escalations are treated as successful outcomes since the agent correctly identified the need for human input.

3. Execution Health & Error Analysis

Monitor system health with composite scoring and detailed error categorization.

Components:

Composite Health Score (0-100) — Weighted calculation:
- 50%: Completion rate (successful executions / total)
- 30%: Tool success rate (successful tool calls / total)
- 20%: Error-free rate (executions without errors / total)
Status Distribution Donut Chart — Breakdown by completed, failed, escalated, in_progress
Duration Metrics with Histogram:
- Average duration
- Median duration
- Minimum duration
- Maximum duration
- 95th percentile (p95)
Tool Reliability Panel — Success rate for each tool with recent failure indicators
Error Categorization — Groups errors into:
- Tool-only failures (tool errors but execution completed)
- Execution errors (LLM or system errors)
- Combined failures (both tool and execution issues)
Stale Execution Detection — Alerts for executions in progress >10 minutes

What to look for:

Health scores below 80 (indicates systemic issues)
Tool reliability below 90% (specific tool may need debugging)
Stale executions (may indicate timeout or hang conditions)
P95 duration significantly higher than average (outlier performance issues)

Example insights:

"Health score dropped from 95 to 75 over the past week" → Recent configuration change or external service degradation
"search_knowledge_base tool has 60% success rate" → Check Azure AI Search service health or index configuration

4. LLM Model & Tool Analytics

Understand which models and tools are used most, their costs, and efficiency metrics.

Components:

Model Distribution Chart — Pie chart showing proportion of calls per model
Model Cost & Token Breakdown Table:
- Model name
- LLM calls count
- Total tokens (including cache read and write tokens)
- Cache read tokens - tokens served from provider cache at a discounted rate
- Cache write tokens - tokens written to provider cache (Anthropic only)
- Total cost
- Token efficiency ratio (request-to-response ratio)
Tool Usage Frequency Bar Chart — Tools ranked by usage with category-based coloring
Agent Tool Profiles — Which agents use which tools
Tool Name Normalization — Dynamic tool IDs (like GKI and Team Messaging tools) are normalized for cleaner reporting

What to look for:

Token efficiency ratios <0.5 (very verbose responses, may be inefficient)
Underutilized tools (available but rarely called)
Model diversity (are you using the right model for each task?)

Example insights:

"Claude Opus 4 is 80% of calls but only 20% of executions" → Certain agents make many iterative LLM calls
"search_emails tool has 500 calls but search_knowledge_base only 50" → Email search may be overused vs knowledge sources

5. Per-Mailbox Usage

Track costs and activity per monitored mailbox for capacity planning and optimization.

Components:

Horizontal Bar Chart — Cost per mailbox
Mailbox Usage Table:
- Mailbox address
- Scans (active scan count)
- LLM Calls
- Cost
- Avg Cost/Active Scan
- Last Scan timestamp
Active vs Idle Scans — Distinguishes between scans that processed emails vs scans with no new messages

Visibility: This section is automatically hidden when no mailbox scan data exists.

What to look for:

Mailboxes with high costs but low active scans (may be scanning too frequently for little benefit)
Mailboxes with no recent scans (may indicate subscription or configuration issues)
Cost-per-active-scan outliers (certain mailboxes may receive complex emails)

Example insights:

"support@company.com costs $50/day but sales@company.com costs $5/day with similar scan frequency" → Support emails may be more complex or trigger more tool use
"Mailbox hasn't scanned in 3 days" → Check Microsoft Graph subscription status

6. Code Mode Analytics

Evaluate the experimental Code Mode feature's adoption, performance, and token savings.

Components:

Adoption Summary:
- Code Mode executions count
- Adoption rate percentage
- Code blocks executed
Token Savings Highlight Card — Primary value proposition showing cumulative token reduction
Side-by-Side Comparison Table — Code Mode vs Standard execution metrics:
- Average tokens per execution
- Average cost per execution
- Success rate
- Average duration
Error Analysis by Code Mode Error Types:
- Timeout (code execution exceeded time limit)
- Circuit breaker (safety mechanism triggered)
- Max turns (exceeded iteration limit)
- Exception (code runtime error)
Per-Agent Code Mode Adoption Table — Which agents are using Code Mode and their performance

Visibility: This section is automatically hidden until Code Mode is actually used (experimental feature).

What to look for:

Token savings >30% (strong indicator Code Mode is beneficial for your use cases)
Error rates higher than standard execution (may need code generation improvements)
Which agents benefit most from Code Mode (for targeted rollout)

Example insights:

"Code Mode saves 45% tokens on average with 92% success rate" → Consider enabling for more agents
"Timeout errors common on Finance Agent Code Mode executions" → Code is too complex or queries are slow

7. Trigger Type Breakdown

Understand costs across different execution triggers to optimize workflows.

Components:

Pie Chart — Cost distribution by trigger type
Details Table:
- Trigger type (friendly label)
- Executions count
- LLM Calls
- Tool Calls
- Cost

Trigger Types:

Inbound Email — Email monitoring and processing
Chat — Test Chat and agent conversations
Scheduled — Timer-based executions
CAIOO Wake-Up — AI Chief of Staff daily cycle
CAIOO Goal Management — AICOS working on goals
CAIOO Project Management — AICOS managing projects
CAIOO Conversation — AICOS email interactions
Long-Running Resume — Resumption of paused executions
And 5 more variants

What to look for:

High costs for scheduled tasks (may be running too frequently)
CAIOO variants consuming unexpected budget (tune AICOS settings)
Trigger types with low execution counts but high costs (expensive edge cases)

Example insights:

"Scheduled tasks cost $100/day but only 20 executions" → Review scheduled reminder frequency
"CAIOO Project Management is 40% of costs" → AICOS is very active, consider adjusting wake-up frequency or project scope

Classifier (Shadow) Tab

The Classifier (Shadow) tab surfaces analytics for the pre-execution classifier while it runs in shadow mode. In shadow mode the classifier evaluates each execution and records its routing decision, but does not act on it — agents continue to run as configured. This lets you measure accuracy and potential cost impact before enabling live routing.

Shadow Mode

Classifier decisions are observed and recorded but not applied. No agent behavior changes while the classifier is in shadow mode.

KPI Cards

Metric	Description
Classified Executions	Number of executions the classifier evaluated in the selected period
Divergence Rate	Execution-weighted rate at which the classifier's decision differed from the static routing rule
Est. Daily Savings	Projected cost reduction if classifier decisions were applied (also shown as weekly)
Timeout Rate	Proportion of classifier calls that exceeded the 1,000 ms budget (`>`0.5% is highlighted in red)

Classifier Latency

Latency percentiles for the classifier evaluation step:

P50 — Median evaluation time
P95 — Typical worst-case latency; values above 1,000 ms are highlighted in amber
P99 — Tail latency; values above 1,000 ms are highlighted in red

A healthy classifier should keep P95 well under the 1,000 ms budget to avoid impacting execution start time.

Divergence by Trigger Type

A horizontal bar chart showing divergence rate broken down by execution trigger type (Inbound Email, Chat, Scheduled, CAIOO variants, etc.). Trigger types with the highest divergence appear first.

Use this view to identify which workflows benefit most from classifier-based routing.

Confidence Distribution

Shows what fraction of classifier decisions fell into each confidence band:

Band	Interpretation
< 60%	Low confidence — classifier is uncertain about the routing decision
60% – 80%	Moderate confidence
≥ 80%	High confidence — classifier is certain about the routing decision

A healthy distribution skews toward the ≥ 80% band. A large proportion of low-confidence decisions may indicate the classifier needs additional tuning for your workload.

No-Data State

If no classifier telemetry has been recorded for the selected period, the tab displays an informational message. The classifier emits data to ExecutionSummaries once executions carry classifier dimensions; new tenants or tenants that recently enabled the classifier may see this state initially.

Variants Tab

The Variants tab shows observability data for Contextual Cognitive State (CCS) variants - lean, focused prompt packages that the pre-execution classifier loads instead of the full CCS. This tab is tagged Phase 3 and becomes meaningful once the classifier begins selecting non-full variants.

KPI Cards

Metric	Description
Non-Full Loads	Number of times the classifier selected a variant (email, planning, chat, or reactive) instead of the full CCS
Active Variants	Count of variants with at least one load in the period, out of all known variants
Nightly Dream Cost	Total cost for the DreamState nightly generation cycles that produce updated variant CCS files
Cost Guardrail	Current status of the BR-027 cost guardrail (Healthy or Degraded)

Per-Variant Hit Rate

A horizontal bar chart ranking each variant by load count, with its hit rate shown as a percentage of all non-full variant loads. Hit rate is recorded each time the classifier successfully reads a variant's CCS file from blob storage. Fallbacks to the full CCS are not counted.

Variants that are registered but have not been loaded in the selected period still appear in a Known Variants chip list in the no-data state.

Unknown variants (loaded but not in the registered list) are flagged with an unknown badge.

Per-Variant Token and Bloat Metrics

Three metric cards display token injection efficiency and prompt-bloat reduction:

Metric	Description
Avg Tokens Injected	Average tokens added to the prompt when this variant loads (pending persistence)
Fallback Rate	Share of variant load attempts that fell back to the full CCS (pending persistence)
Prompt Bloat vs 1.10 Baseline	Percentage reduction in prompt size compared to the Story 1.10 baseline (baseline pending)

These metrics show dash placeholders until the underlying data points are persisted to ExecutionSummaries.

Nightly Generation Cost (Per Agent)

A table listing each agent that completed at least one DreamState cycle in the period:

Column	Description
Agent	Friendly name and internal agent ID
Cycles	Number of nightly generation cycles completed
Total Cost	Cumulative LLM cost for variant generation
Avg / Cycle	Average cost per generation cycle

Use this table to identify which agents incur the most nightly overhead and to detect unexpectedly high generation costs.

Cost Guardrail (BR-027)

The guardrail monitors whether the Stage 2 nightly generation cost exceeds 1.20 times the Stage 1 baseline on consecutive nights.

Healthy state: All variants are generating and the nightly cost is within the 1.20x threshold. A recent-breach count and timestamp are shown if the guardrail was tripped in the past but has since recovered.

Degraded state: The guardrail has tripped. The panel shows:

How many consecutive nights the threshold was breached
When the guardrail tripped
Which variants are still generating in degraded mode (limited subset)

Only a manual clear by a platform administrator can exit degraded mode.

No-Data State

If no variant loads or dream cycles are recorded for the selected period, the tab shows the list of known variants as chips and explains that variants populate as the classifier selects non-full lenses and the nightly DreamState cycle generates them. Try a longer period or confirm with your platform administrator that Phase 3 is active.

How Data is Collected

The Usage Dashboard aggregates data from Azure Table Storage's ExecutionSummaries entity, which captures:

Every agent execution (email, chat, scheduled)
All mailbox scan operations
AICOS activity across all modes
LLM model calls, token usage (including cache read and write tokens), and costs
Tool invocations and their results

Nine dedicated API endpoints serve different analytical views, including a dedicated /api/console/usage-stats/classifier-shadow endpoint for the Classifier tab. The frontend renders interactive charts using the Recharts library with a unified period selector and responsive layouts optimized for both desktop and mobile.

All data is tenant-scoped and loaded in parallel for fast page rendering. No personally identifiable information (PII) from email content is stored in analytics tables.

Data Volume Limit and Truncation Warning

Each dashboard endpoint scans up to 10,000 execution rows for the selected period. When your tenant has more than 10,000 executions in the chosen window, any affected dashboard tab displays a yellow warning banner:

Showing aggregates over the most recent 10,000 executions. Older rows in this period were not included.

This means aggregated metrics - divergence rate, estimated savings, per-routing-class volumes, costs, and health scores - reflect only the most recent 10,000 executions, not the full period. High-volume tenants (typically 50+ mailboxes at moderate scan frequency) are most likely to encounter this limit on 30-day or 90-day views.

Mitigations:

Use a shorter period (7 days) to stay within the limit.
Contact your platform administrator if you regularly see this warning - pre-aggregated roll-ups are on the product roadmap for high-volume tenants.

Best Practices

Regular Monitoring

Daily: Check health score and error analysis for immediate issues
Weekly: Review per-agent costs and look for optimization opportunities
Monthly: Analyze 30 or 90-day trends for capacity planning

Cost Optimization Strategies

High-cost agents: Consider smaller models or prompt optimization
Underused tools: Remove from agents to reduce prompt token overhead
Expensive models for simple tasks: Use GPT-4o-mini or Claude Haiku where appropriate
Frequent mailbox scans with few active scans: Reduce scan frequency

Performance Optimization

Low health scores: Investigate recent configuration changes or external service issues
Tool reliability <90%: Debug specific tools or replace with alternatives
P95 duration outliers: Find and optimize slow executions
Stale executions: Check for timeout issues or long-running processes

Model Selection

Use the Model Analytics section to:

Compare token efficiency ratios across models
Identify tasks that would benefit from smaller/faster models
Understand which agents drive most LLM costs
Evaluate if premium models (Opus, GPT-4) are necessary for all use cases

Code Mode Evaluation

If experimenting with Code Mode:

Target >30% token savings for meaningful ROI
Monitor error rates closely (should be comparable to standard execution)
Start with agents that have well-defined, structured tasks
Review per-agent adoption to find best-fit use cases

Troubleshooting

A yellow banner reading "Showing aggregates over the most recent 10,000 executions" appears on one or more tabs when the selected period contains more than 10,000 executions.

Impact: Metrics on that tab (costs, health scores, divergence rate, estimated savings, per-class volumes) are computed from the most recent 10,000 executions only. The banner is informational - data is still shown, but may understate activity for the oldest portion of the period.

Solution: Switch to a shorter period (7 days) to stay under the limit. If this warning appears consistently and accurate full-period aggregation is important for your workflows, contact your platform administrator.

Dashboard Shows No Data

This can occur when:

No agent executions have completed in the selected period
The tenant is newly created
System clock issues (check execution timestamps)

Solution: Verify agents are configured and triggered at least once. Check Monitor > Activity > Agent Activity for recent executions.

Costs Don't Match Provider Billing

The dashboard calculates costs based on:

Model pricing configured in Build > AI Configuration > LLM Models
Token counts from API responses
Does NOT include provider fees, taxes, or enterprise discounts

Solution: Use the dashboard for relative cost tracking and trends. Compare with provider billing statements monthly to ensure pricing configuration accuracy.

Health Score Seems Wrong

The composite health score weights completion rate highest (50%), tool success rate next (30%), and error-free rate last (20%). This means:

An agent with 90% completion but 50% tool success = ~72 health score
An agent with 100% completion and tool success but 10% with non-fatal errors = ~98 health score

Solution: Review individual components (status distribution, tool reliability, error categories) for root cause instead of relying solely on the composite score.

Classifier Tab Shows "No Data"

The Classifier (Shadow) tab displays a no-data message when no classifier telemetry has been recorded for the selected period.

Possible causes:

The pre-execution classifier is not yet enabled for your tenant
The selected period predates classifier activation
Executions have not yet carried classifier dimensions to ExecutionSummaries

Solution: Confirm with your platform administrator that the pre-execution classifier is active. Try a longer period (30 or 90 days) to capture historical data. Once at least one execution records classifier dimensions, the tab will show data.

Classifier Timeout Rate Is High

A timeout rate above 0.5% means the classifier is frequently exceeding its 1,000 ms budget, which may add latency to execution starts.

Solution: Contact support. High timeout rates may indicate infrastructure resource pressure or a classifier configuration issue.

Variants Tab Shows No Data

The Variants tab shows a no-data state when no CCS variant loads or DreamState cycles are recorded for the selected period.

Possible causes:

The pre-execution classifier is not yet selecting non-full variants (Phase 3 feature)
The selected period predates Phase 3 activation
No nightly DreamState cycles have run yet for any agent

Solution: Confirm with your platform administrator that Phase 3 is active. Try a longer period (30 or 90 days). Once the classifier picks at least one non-full variant or a DreamState cycle completes, the tab shows data.

Cost Guardrail Shows Degraded

The BR-027 guardrail trips when the Stage 2 nightly DreamState generation cost exceeds 1.20 times the Stage 1 baseline on consecutive nights. In degraded mode only a limited subset of variants is generated.

Solution: Contact your platform administrator. Manual intervention is the only way to clear the degraded state. Until cleared, the Variants tab KPI card shows Degraded in red.

Code Mode Section Not Visible

Code Mode Analytics is automatically hidden until at least one execution uses Code Mode.

Solution: Enable Code Mode on an agent (experimental feature) and trigger at least one execution. The section will appear after data is available.

Agent Executions — Real-time execution activity and filtering
Execution Details — Deep dive into specific executions
Daily Activity Summary — Automated email reports for daily monitoring
LLM Models — Configure model pricing for accurate cost tracking
Agents — Agent configuration and optimization

Overview​

Dashboard Tabs​

Accessing the Usage Dashboard​

Period Selection​

Overview Tab Sections​

1. Daily Cost Trend​

2. Per-Agent Cost Breakdown​

3. Execution Health & Error Analysis​

4. LLM Model & Tool Analytics​

5. Per-Mailbox Usage​

6. Code Mode Analytics​

7. Trigger Type Breakdown​

Classifier (Shadow) Tab​

KPI Cards​

Classifier Latency​

Divergence by Trigger Type​

Confidence Distribution​

No-Data State​

Variants Tab​

KPI Cards​

Per-Variant Hit Rate​

Per-Variant Token and Bloat Metrics​

Nightly Generation Cost (Per Agent)​

Cost Guardrail (BR-027)​

No-Data State​

How Data is Collected​

Data Volume Limit and Truncation Warning​

Best Practices​

Regular Monitoring​

Cost Optimization Strategies​

Performance Optimization​

Model Selection​

Code Mode Evaluation​

Troubleshooting​

Dashboard Shows a Truncation Warning Banner​

Dashboard Shows No Data​

Costs Don't Match Provider Billing​

Health Score Seems Wrong​

Classifier Tab Shows "No Data"​

Classifier Timeout Rate Is High​

Variants Tab Shows No Data​

Cost Guardrail Shows Degraded​

Code Mode Section Not Visible​

Related Topics​

Overview

Dashboard Tabs

Accessing the Usage Dashboard

Period Selection

Overview Tab Sections

1. Daily Cost Trend

2. Per-Agent Cost Breakdown

3. Execution Health & Error Analysis

4. LLM Model & Tool Analytics

5. Per-Mailbox Usage

6. Code Mode Analytics

7. Trigger Type Breakdown

Classifier (Shadow) Tab

KPI Cards

Classifier Latency

Divergence by Trigger Type

Confidence Distribution

No-Data State

Variants Tab

KPI Cards

Per-Variant Hit Rate

Per-Variant Token and Bloat Metrics

Nightly Generation Cost (Per Agent)

Cost Guardrail (BR-027)

No-Data State

How Data is Collected

Data Volume Limit and Truncation Warning

Best Practices

Regular Monitoring

Cost Optimization Strategies

Performance Optimization

Model Selection

Code Mode Evaluation

Troubleshooting

Dashboard Shows a Truncation Warning Banner

Dashboard Shows No Data

Costs Don't Match Provider Billing

Health Score Seems Wrong

Classifier Tab Shows "No Data"

Classifier Timeout Rate Is High

Variants Tab Shows No Data

Cost Guardrail Shows Degraded

Code Mode Section Not Visible

Related Topics