Email Indexing for Search
Email Indexing for Search transforms your email archives into a searchable knowledge base that AI agents can query instantly. This feature automatically identifies valuable emails, extracts key insights, and indexes them for semantic search.
Overview
The Email Indexing for Search feature enables you to:
- Mine valuable emails from shared mailboxes or user mailboxes
- AI-powered evaluation automatically assesses each email's knowledge value
- Topic classification categorizes emails into your defined taxonomy
- Human approval workflow ensures quality control before permanent indexing
- Incremental sync uses Microsoft Graph delta queries for efficient updates
How It Works
1. Configure Knowledge Source
Point to a mailbox with filtering criteria (topics, keywords, date range)
2. Trigger Scan
Manual button click or automatic daily schedule processes emails via Microsoft Graph API
3. AI Evaluation
Each email receives a value score (0-1), summary, and topic classification
4. Approval Queue
High-value emails enter approval queue for human review (if enabled)
5. Index to Search
Approved emails are indexed to Azure AI Search with metadata
6. Agent Access
AI agents search the knowledge base to answer questions with citations
Setting Up Email Indexing
Prerequisites
Before configuring email indexing, ensure you have:
- Email History Index - A Grounded Knowledge Index of type "Email Archive" must be provisioned. See Grounded Knowledge Index Setup Wizard.
- Microsoft 365 Integration - Your organization must be connected via Build > Connections > Microsoft 365.
- Topic Taxonomy - Optionally, configure a topic taxonomy for classification. See Topic Taxonomy.
Step 1: Navigate to Knowledge Sources
- Go to Build > Knowledge > Knowledge Sources
- Click Add Knowledge Source to create a new source
Step 2: Configure the Knowledge Source
Configure the following settings:
| Setting | Description |
|---|---|
| Mailbox Address | The shared mailbox or user mailbox to scan |
| Source Type | Shared Mailbox, User Mailbox, or Specific Folder |
| Scan Mode | Manual Only or Daily (every 24 hours) |
| Date Range | Oldest email date to limit historical scans |
| Topic Filters | Only index emails matching specific topics |
| Include Keywords | Keywords that must appear in the email |
| Exclude Keywords | Keywords that disqualify an email from indexing |
| Minimum Value Score | AI assessment threshold (0.0-1.0, default: 0.7) |
| Require Approval | Enable human review before indexing |
Step 3: Trigger a Scan
After saving the knowledge source:
- Click Scan Now to trigger an immediate scan
- Monitor progress in Build > Knowledge > Scan Activity
- Scans process emails in streaming fashion with automatic checkpointing
Value Assessment
When processing emails, the AI evaluates each one on multiple dimensions:
| Dimension | Description |
|---|---|
| Value Score | 0.0 to 1.0 rating of the email's knowledge value |
| Content Summary | AI-generated summary of the email's key points |
| Key Insights | Bullet-point extraction of actionable information |
| Keywords | Relevant terms for search and filtering |
| Topic Assignment | Classification into your topic taxonomy |
| Value Reason | Explanation of why the email received its score |
Value Score Thresholds
- 0.8 - 1.0 (Excellent): Authoritative documents, policies, detailed procedures
- 0.6 - 0.8 (Good): Useful information, specific answers, context-rich content
- 0.4 - 0.6 (Moderate): General information, may need review
- 0.0 - 0.4 (Low): Routine communications, unlikely to be useful
Approval Queue
When Require Approval is enabled on a knowledge source, high-value emails enter the approval queue before indexing.
Accessing the Approval Queue
Navigate to Monitor > Review Queues > Pending Actions to view items awaiting approval.
Approval Queue Features
| Feature | Description |
|---|---|
| Filtering | Filter by status, topic, agent, or date range |
| Bulk Actions | Approve or deny multiple items at once |
| Detail View | See full email content, AI assessment, and suggested topic |
| Edit Before Approve | Modify the email summary or topic assignment |
| Ignore Menu | Block sender or domain from future indexing |
Approval Workflow
- Review the email content, value assessment, and suggested topic
- Approve to index the email to the knowledge base
- Deny to reject the email (with optional feedback)
- Edit to modify the summary or topic before approving
Ignore Rules
Ignore rules prevent specific senders or domains from being indexed, useful for filtering out automated notifications, newsletters, or sensitive communications.
Creating Ignore Rules
- From the approval queue, click the Ignore Menu (three dots) on any item
- Choose Ignore Sender or Ignore Domain
- Optionally provide a reason for the rule
- Click Confirm to create the rule
Alternatively, manage all rules from Build > Knowledge > Knowledge Sources > Ignore Rules tab.
Ignore Rule Behavior
- Sender rules block emails from a specific email address
- Domain rules block emails from any address at that domain
- Creating a rule automatically denies all pending emails from that sender/domain
- Rules can be deleted to resume indexing from that source
Folder-Based Delta Sync
Email indexing uses Microsoft Graph folder-based delta queries for efficient synchronization:
| Feature | Benefit |
|---|---|
| Per-folder sync tokens | Each folder maintains its own delta token for independent syncing |
| Incremental updates | Only new or changed emails are processed after initial scan |
| Streaming architecture | Constant memory footprint regardless of mailbox size |
| Automatic checkpointing | Resume from last successful page if scan is interrupted |
| Excluded folders | Junk, Drafts, Notes, Archive, and system folders are automatically skipped |
Scheduled Scanning
When a knowledge source is set to Daily scan mode, a background timer automatically handles scanning without any manual intervention.
How It Works
- A background timer runs every 30 minutes to check for sources that are due for scanning
- A source is considered due when its last scan completed more than 24 hours ago, or when it has never been scanned
- Scheduled scans are always incremental (delta) - only new emails since the last scan are processed
- Only one scan runs at a time per source, preventing duplicate processing
What You Need to Know
- No manual action is required once daily mode is enabled
- The system automatically picks up new sources and begins scanning them at the next timer cycle
- If a scheduled scan fails, it will be retried at the next timer cycle (within 30 minutes)
When you first enable daily scanning, the first scan begins at the next timer cycle (within 30 minutes). You do not need to trigger a manual scan.
Scanned vs. Excluded Folders
Scanned by Default
- Inbox
- Sent Items
- Deleted Items
- Custom user folders
Excluded by Default
- Junk Email / Spam
- Drafts
- Notes
- Archive
- Conversation History
- RSS Feeds
- Search Folders (virtual)
Best Practices
Start Conservative
- Begin with a high value threshold (0.8+) to capture only the best content
- Enable approval for all responses initially
- Limit initial scans to recent emails (last 6-12 months)
- Expand criteria gradually as confidence grows
Topic Design
- Create a topic taxonomy before your first scan
- Use specific, descriptive topic names (e.g., "Billing Disputes" not "Money")
- Limit hierarchy depth to 2-3 levels
- Ensure all expected email types have a home
Privacy Considerations
- Create ignore rules for HR, Legal, and other sensitive senders
- Review approval queue items before indexing sensitive content
- Configure PII redaction if handling personal data
- Use the minimum value threshold to filter routine communications
Troubleshooting
Emails Not Appearing in Approval Queue
- Verify the knowledge source is active
- Check that the scan has completed (see Scan Activity)
- Ensure emails meet the value threshold
- Verify emails are not blocked by ignore rules
Slow Scan Performance
- Narrow the date range to reduce historical emails
- Add keyword filters to focus on relevant content
- Check if the mailbox has large folders (10,000+ emails)
- Review scan activity for rate limiting or errors
Missing Topics in Classification
- Verify topics exist in the taxonomy
- Check topic descriptions are specific enough
- Review classification on a sample of approved emails
- Consider adding sub-topics for better precision
Delta Token Expired
If a scan fails with "delta token expired":
- The system automatically triggers a full rescan
- This may take longer than normal incremental scans
- No data is lost; only additional processing time required
Agent-Driven Email Mining
While administrators configure knowledge sources through the Admin Console, AI agents can also be empowered to create and manage email mining operations autonomously using the Mine Mailbox Tool.
This is useful when:
- Agents identify knowledge gaps during their operations
- SME agents need to build domain-specific knowledge for research
- Orchestrator agents set up knowledge sources for sub-agents or new team members
See Mine Mailbox Tool for details on enabling agent-driven email mining.