Skip to main content

Email Indexing for Search

Email Indexing for Search transforms your email archives into a searchable knowledge base that AI agents can query instantly. This feature automatically identifies valuable emails, extracts key insights, and indexes them for semantic search.

Overview

The Email Indexing for Search feature enables you to:

  • Mine valuable emails from shared mailboxes or user mailboxes
  • AI-powered evaluation automatically assesses each email's knowledge value
  • Topic classification categorizes emails into your defined taxonomy
  • Human approval workflow ensures quality control before permanent indexing
  • Incremental sync uses Microsoft Graph delta queries for efficient updates

How It Works

1. Configure Knowledge Source
Point to a mailbox with filtering criteria (topics, keywords, date range)

2. Trigger Scan
Manual button click or automatic daily schedule processes emails via Microsoft Graph API

3. AI Evaluation
Each email receives a value score (0-1), summary, and topic classification

4. Approval Queue
High-value emails enter approval queue for human review (if enabled)

5. Index to Search
Approved emails are indexed to Azure AI Search with metadata

6. Agent Access
AI agents search the knowledge base to answer questions with citations

Setting Up Email Indexing

Prerequisites

Before configuring email indexing, ensure you have:

  1. Email History Index - A Grounded Knowledge Index of type "Email Archive" must be provisioned. See Grounded Knowledge Index Setup Wizard.
  2. Microsoft 365 Integration - Your organization must be connected via Build > Connections > Microsoft 365.
  3. Topic Taxonomy - Optionally, configure a topic taxonomy for classification. See Topic Taxonomy.

Step 1: Navigate to Knowledge Sources

  1. Go to Build > Knowledge > Knowledge Sources
  2. Click Add Knowledge Source to create a new source

Step 2: Configure the Knowledge Source

Configure the following settings:

SettingDescription
Mailbox AddressThe shared mailbox or user mailbox to scan
Source TypeShared Mailbox, User Mailbox, or Specific Folder
Scan ModeManual Only or Daily (every 24 hours)
Date RangeOldest email date to limit historical scans
Topic FiltersOnly index emails matching specific topics
Include KeywordsKeywords that must appear in the email
Exclude KeywordsKeywords that disqualify an email from indexing
Minimum Value ScoreAI assessment threshold (0.0-1.0, default: 0.7)
Require ApprovalEnable human review before indexing

Step 3: Trigger a Scan

After saving the knowledge source:

  1. Click Scan Now to trigger an immediate scan
  2. Monitor progress in Build > Knowledge > Scan Activity
  3. Scans process emails in streaming fashion with automatic checkpointing

Value Assessment

When processing emails, the AI evaluates each one on multiple dimensions:

DimensionDescription
Value Score0.0 to 1.0 rating of the email's knowledge value
Content SummaryAI-generated summary of the email's key points
Key InsightsBullet-point extraction of actionable information
KeywordsRelevant terms for search and filtering
Topic AssignmentClassification into your topic taxonomy
Value ReasonExplanation of why the email received its score

Value Score Thresholds

  • 0.8 - 1.0 (Excellent): Authoritative documents, policies, detailed procedures
  • 0.6 - 0.8 (Good): Useful information, specific answers, context-rich content
  • 0.4 - 0.6 (Moderate): General information, may need review
  • 0.0 - 0.4 (Low): Routine communications, unlikely to be useful

Approval Queue

When Require Approval is enabled on a knowledge source, high-value emails enter the approval queue before indexing.

Accessing the Approval Queue

Navigate to Monitor > Review Queues > Pending Actions to view items awaiting approval.

Approval Queue Features

FeatureDescription
FilteringFilter by status, topic, agent, or date range
Bulk ActionsApprove or deny multiple items at once
Detail ViewSee full email content, AI assessment, and suggested topic
Edit Before ApproveModify the email summary or topic assignment
Ignore MenuBlock sender or domain from future indexing

Approval Workflow

  1. Review the email content, value assessment, and suggested topic
  2. Approve to index the email to the knowledge base
  3. Deny to reject the email (with optional feedback)
  4. Edit to modify the summary or topic before approving

Ignore Rules

Ignore rules prevent specific senders or domains from being indexed, useful for filtering out automated notifications, newsletters, or sensitive communications.

Creating Ignore Rules

  1. From the approval queue, click the Ignore Menu (three dots) on any item
  2. Choose Ignore Sender or Ignore Domain
  3. Optionally provide a reason for the rule
  4. Click Confirm to create the rule

Alternatively, manage all rules from Build > Knowledge > Knowledge Sources > Ignore Rules tab.

Ignore Rule Behavior

  • Sender rules block emails from a specific email address
  • Domain rules block emails from any address at that domain
  • Creating a rule automatically denies all pending emails from that sender/domain
  • Rules can be deleted to resume indexing from that source

Folder-Based Delta Sync

Email indexing uses Microsoft Graph folder-based delta queries for efficient synchronization:

FeatureBenefit
Per-folder sync tokensEach folder maintains its own delta token for independent syncing
Incremental updatesOnly new or changed emails are processed after initial scan
Streaming architectureConstant memory footprint regardless of mailbox size
Automatic checkpointingResume from last successful page if scan is interrupted
Excluded foldersJunk, Drafts, Notes, Archive, and system folders are automatically skipped

Scheduled Scanning

When a knowledge source is set to Daily scan mode, a background timer automatically handles scanning without any manual intervention.

How It Works

  • A background timer runs every 30 minutes to check for sources that are due for scanning
  • A source is considered due when its last scan completed more than 24 hours ago, or when it has never been scanned
  • Scheduled scans are always incremental (delta) - only new emails since the last scan are processed
  • Only one scan runs at a time per source, preventing duplicate processing

What You Need to Know

  • No manual action is required once daily mode is enabled
  • The system automatically picks up new sources and begins scanning them at the next timer cycle
  • If a scheduled scan fails, it will be retried at the next timer cycle (within 30 minutes)
tip

When you first enable daily scanning, the first scan begins at the next timer cycle (within 30 minutes). You do not need to trigger a manual scan.

Scanned vs. Excluded Folders

Scanned by Default

  • Inbox
  • Sent Items
  • Deleted Items
  • Custom user folders

Excluded by Default

  • Junk Email / Spam
  • Drafts
  • Notes
  • Archive
  • Conversation History
  • RSS Feeds
  • Search Folders (virtual)

Best Practices

Start Conservative

  1. Begin with a high value threshold (0.8+) to capture only the best content
  2. Enable approval for all responses initially
  3. Limit initial scans to recent emails (last 6-12 months)
  4. Expand criteria gradually as confidence grows

Topic Design

  1. Create a topic taxonomy before your first scan
  2. Use specific, descriptive topic names (e.g., "Billing Disputes" not "Money")
  3. Limit hierarchy depth to 2-3 levels
  4. Ensure all expected email types have a home

Privacy Considerations

  1. Create ignore rules for HR, Legal, and other sensitive senders
  2. Review approval queue items before indexing sensitive content
  3. Configure PII redaction if handling personal data
  4. Use the minimum value threshold to filter routine communications

Troubleshooting

Emails Not Appearing in Approval Queue

  1. Verify the knowledge source is active
  2. Check that the scan has completed (see Scan Activity)
  3. Ensure emails meet the value threshold
  4. Verify emails are not blocked by ignore rules

Slow Scan Performance

  1. Narrow the date range to reduce historical emails
  2. Add keyword filters to focus on relevant content
  3. Check if the mailbox has large folders (10,000+ emails)
  4. Review scan activity for rate limiting or errors

Missing Topics in Classification

  1. Verify topics exist in the taxonomy
  2. Check topic descriptions are specific enough
  3. Review classification on a sample of approved emails
  4. Consider adding sub-topics for better precision

Delta Token Expired

If a scan fails with "delta token expired":

  1. The system automatically triggers a full rescan
  2. This may take longer than normal incremental scans
  3. No data is lost; only additional processing time required

Agent-Driven Email Mining

While administrators configure knowledge sources through the Admin Console, AI agents can also be empowered to create and manage email mining operations autonomously using the Mine Mailbox Tool.

This is useful when:

  • Agents identify knowledge gaps during their operations
  • SME agents need to build domain-specific knowledge for research
  • Orchestrator agents set up knowledge sources for sub-agents or new team members

See Mine Mailbox Tool for details on enabling agent-driven email mining.