SharePoint Knowledge Sources
SharePoint Knowledge Sources enable you to connect your organization's SharePoint document libraries to Outermind, making corporate documents searchable by AI agents. Documents are automatically indexed on a configurable schedule, allowing agents to find and cite information from policies, procedures, reports, and other organizational content.
Overview
SharePoint Knowledge Sources provide:
- Multi-site indexing - Connect multiple SharePoint sites and document libraries to a single knowledge index
- Automatic synchronization - Documents are indexed on a configurable schedule using efficient delta sync
- AI-powered summaries - Each document is automatically summarized to improve search relevance
- Multi-format support - Extracts text from Word, Excel, PowerPoint, PDF, and other common formats
- Secure read-only access - Integration is completely read-only and uses your existing Microsoft 365 authentication
How It Works
1. Create Knowledge Source
Select SharePoint sites and document libraries to index
2. Configure Sync
Set sync frequency (hourly, every 6 hours, or daily) and file filters
3. Initial Sync
System downloads and indexes all matching documents
4. Delta Sync
Subsequent syncs only process new or changed documents
5. Agent Access
AI agents search the indexed documents through Grounded Knowledge Index tools
Prerequisites
Before configuring SharePoint Knowledge Sources, ensure you have:
- Microsoft 365 Integration - Your organization must be connected via Build > Connections > Microsoft 365
- SharePoint Permissions - Admin consent must be granted for SharePoint permissions (Sites.Read.All, Files.Read.All)
- Grounded Knowledge Index - A GKI of type "SharePoint" must be provisioned (the wizard will create one if needed)
- Azure AI Search - A search provider must be configured. See Azure AI Search Setup
Setting Up a SharePoint Source
Step 1: Navigate to SharePoint Sources
- Go to Build > Knowledge > SharePoint Sources
- Click Add Source to launch the creation wizard
Step 2: Name and Select Index
Configure the basic settings:
| Setting | Description |
|---|---|
| Display Name | A friendly name for this knowledge source (e.g., "Company Policies") |
| Description | Optional description of what documents are included |
| Target Index | The Grounded Knowledge Index to populate (or create a new one) |
Step 3: Select SharePoint Sites
Search for and select the SharePoint sites containing your documents:
- Enter a search term to find sites accessible to your organization
- Click on a site to select it
- You can add multiple sites to a single knowledge source
Step 4: Select Document Libraries
For each selected site, choose which document libraries to index:
- Documents - The default document library
- Shared Documents - Common in team sites
- Custom libraries specific to your organization
You don't need to select every library. Focus on those containing valuable reference content.
Step 5: Configure Sync Settings
| Setting | Description |
|---|---|
| Sync Mode | Manual (on-demand only) or Scheduled (automatic) |
| Sync Frequency | How often to check for changes: Hourly, Every 6 Hours, or Daily |
| File Types | Which document types to index (docx, xlsx, pdf, pptx, txt, md, html) |
| Max File Size | Documents larger than 5 MB are skipped by default |
| Generate Summaries | Enable AI summarization for better search relevance |
Step 6: Review and Create
Review your configuration and click Create Source to begin indexing.
Supported File Types
| Format | Extension | Notes |
|---|---|---|
| Word | .docx, .doc | Full text extraction including headers and footers |
| Excel | .xlsx, .xls | Extracts content from all sheets |
| PowerPoint | .pptx | Extracts slide text and speaker notes |
| Text-based PDFs only (scanned PDFs not supported) | ||
| Text | .txt | Plain text files |
| Markdown | .md | Markdown files |
| HTML | .html, .htm | Extracts body content, removes scripts and navigation |
Not supported: Scanned PDFs (require OCR), images, videos, OneNote notebooks.
Managing SharePoint Sources
Viewing Source Details
From Build > Knowledge > SharePoint Sources, click on a source to view:
- Overview - Status, statistics, and recent sync activity
- Libraries - Connected document libraries with individual sync status
- Documents - Paginated list of indexed documents with search and filters
- Sync History - Log of past sync operations
- Settings - Edit configuration or delete the source
Triggering a Manual Sync
- Open the source detail page
- Click Sync Now in the header
- Choose sync type:
- Delta - Process only new and changed documents (faster)
- Full - Re-process all documents (use when troubleshooting)
Pausing and Resuming Sync
To temporarily stop scheduled syncs:
- Open the source detail page
- Click Pause to stop scheduled syncs
- Click Resume to restart scheduled syncs
Pausing does not affect the indexed documents or delete any data.
Viewing Indexed Documents
The Documents tab shows all indexed documents with:
| Column | Description |
|---|---|
| File Name | Document name with link to SharePoint |
| Site/Library | Where the document is stored |
| File Type | Document format |
| Author | Who created the document |
| Modified | When the document was last changed |
| Status | Index status (indexed, pending, failed) |
Use filters to find specific documents by library, file type, or status.
Re-indexing a Document
If a document's index entry seems outdated or incorrect:
- Find the document in the Documents tab
- Click the document row to open details
- Click Reindex to queue the document for re-processing
AI-Powered Document Summaries
When Generate Summaries is enabled, each document is processed by an AI model to generate:
| Field | Description |
|---|---|
| Summary | 2-3 sentence description of the document's content |
| Keywords | 5-10 relevant search terms |
| Category | Classification (policy, procedure, report, template, reference, presentation, spreadsheet, other) |
These AI-generated fields significantly improve search relevance, especially for long documents where full-text search may return poor matches.
Sync Frequencies
| Frequency | When It Runs | Best For |
|---|---|---|
| Hourly | Every hour | Frequently updated document libraries |
| Every 6 Hours | 4 times per day | Moderately active libraries |
| Daily | Once per day (2 AM UTC) | Stable reference documentation |
Start with Daily sync and increase frequency only if agents report stale results.
Delta Sync Architecture
SharePoint Knowledge Sources use Microsoft Graph delta queries for efficient synchronization:
- Initial sync downloads and processes all matching documents
- Subsequent syncs only process documents that have changed since the last sync
- Per-library tokens track changes independently for each document library
- Automatic retry handles transient failures with exponential backoff
This approach minimizes API calls and processing time while keeping your knowledge base current.
Best Practices
Start Focused
- Begin with 1-2 high-value libraries (e.g., HR Policies, Product Documentation)
- Enable AI summaries to maximize search quality
- Use Daily sync to minimize processing load
- Expand to additional libraries as you confirm value
Organize for Discoverability
- Ensure documents have meaningful titles (not "Document1.docx")
- Store related documents in dedicated libraries with clear names
- Use folders to organize within libraries (paths are indexed)
- Keep file sizes reasonable (under 5 MB)
Content Quality
- Focus on authoritative, reference-quality documents
- Avoid indexing drafts, personal notes, or obsolete content
- Archive outdated documents to excluded folders
- Consider separate sources for different content types
Security Considerations
- SharePoint access uses application-level permissions, not user delegation
- All indexed content is accessible to AI agents for your entire organization
- Do not index document libraries containing sensitive or restricted content
- Review indexed documents periodically to ensure appropriateness
Monitoring and Troubleshooting
Sync Activity
Monitor sync operations from Build > Knowledge > Scan Activity:
- View running, completed, and failed syncs
- See document counts (added, updated, deleted, failed)
- Review error messages for failed operations
Common Issues
Documents Not Being Indexed
- Verify the file type is in the configured filter
- Check the file size is under 5 MB
- Ensure the document library is selected
- Review the sync history for errors
Slow Sync Performance
- Large initial syncs may take time; subsequent delta syncs are faster
- Reduce the number of libraries if processing is taking too long
- Check for rate limiting in sync history (Graph API has limits)
Search Results Not Finding Documents
- Verify the sync has completed successfully
- Check the document is in the "indexed" status
- Allow a few minutes for search index propagation
- Try searching with different terms from the document title or content
AI Summary Not Generated
- Confirm Generate Summaries is enabled in settings
- Very short documents may not have meaningful summaries
- Documents with extraction errors skip summarization
Agent Integration
AI agents access SharePoint documents through the Grounded Knowledge Index tool infrastructure. When an agent needs to search corporate documents:
- Agent uses the
search_knowledge_indextool (or similar GKI tool) - Query is executed against Azure AI Search
- Relevant documents are returned with:
- Title and summary
- Relevance score
- Direct link to SharePoint document
- Agent cites sources in its response
No additional configuration is needed. Once documents are indexed, they're automatically available to all agents with access to the target GKI.