Skip to main content

SharePoint Knowledge Sources

SharePoint Knowledge Sources enable you to connect your organization's SharePoint document libraries to Outermind, making corporate documents searchable by AI agents. Documents are automatically indexed on a configurable schedule, allowing agents to find and cite information from policies, procedures, reports, and other organizational content.

Overview

SharePoint Knowledge Sources provide:

  • Multi-site indexing - Connect multiple SharePoint sites and document libraries to a single knowledge index
  • Automatic synchronization - Documents are indexed on a configurable schedule using efficient delta sync
  • AI-powered summaries - Each document is automatically summarized to improve search relevance
  • Multi-format support - Extracts text from Word, Excel, PowerPoint, PDF, and other common formats
  • Secure read-only access - Integration is completely read-only and uses your existing Microsoft 365 authentication

How It Works

1. Create Knowledge Source
Select SharePoint sites and document libraries to index

2. Configure Sync
Set sync frequency (hourly, every 6 hours, or daily) and file filters

3. Initial Sync
System downloads and indexes all matching documents

4. Delta Sync
Subsequent syncs only process new or changed documents

5. Agent Access
AI agents search the indexed documents through Grounded Knowledge Index tools

Prerequisites

Before configuring SharePoint Knowledge Sources, ensure you have:

  1. Microsoft 365 Integration - Your organization must be connected via Build > Connections > Microsoft 365
  2. SharePoint Permissions - Admin consent must be granted for SharePoint permissions (Sites.Read.All, Files.Read.All)
  3. Grounded Knowledge Index - A GKI of type "SharePoint" must be provisioned (the wizard will create one if needed)
  4. Azure AI Search - A search provider must be configured. See Azure AI Search Setup

Setting Up a SharePoint Source

Step 1: Navigate to SharePoint Sources

  1. Go to Build > Knowledge > SharePoint Sources
  2. Click Add Source to launch the creation wizard

Step 2: Name and Select Index

Configure the basic settings:

SettingDescription
Display NameA friendly name for this knowledge source (e.g., "Company Policies")
DescriptionOptional description of what documents are included
Target IndexThe Grounded Knowledge Index to populate (or create a new one)

Step 3: Select SharePoint Sites

Search for and select the SharePoint sites containing your documents:

  1. Enter a search term to find sites accessible to your organization
  2. Click on a site to select it
  3. You can add multiple sites to a single knowledge source

Step 4: Select Document Libraries

For each selected site, choose which document libraries to index:

  • Documents - The default document library
  • Shared Documents - Common in team sites
  • Custom libraries specific to your organization

You don't need to select every library. Focus on those containing valuable reference content.

Step 5: Configure Sync Settings

SettingDescription
Sync ModeManual (on-demand only) or Scheduled (automatic)
Sync FrequencyHow often to check for changes: Hourly, Every 6 Hours, or Daily
File TypesWhich document types to index (docx, xlsx, pdf, pptx, txt, md, html)
Max File SizeDocuments larger than 5 MB are skipped by default
Generate SummariesEnable AI summarization for better search relevance

Step 6: Review and Create

Review your configuration and click Create Source to begin indexing.

Supported File Types

FormatExtensionNotes
Word.docx, .docFull text extraction including headers and footers
Excel.xlsx, .xlsExtracts content from all sheets
PowerPoint.pptxExtracts slide text and speaker notes
PDF.pdfText-based PDFs only (scanned PDFs not supported)
Text.txtPlain text files
Markdown.mdMarkdown files
HTML.html, .htmExtracts body content, removes scripts and navigation

Not supported: Scanned PDFs (require OCR), images, videos, OneNote notebooks.

Managing SharePoint Sources

Viewing Source Details

From Build > Knowledge > SharePoint Sources, click on a source to view:

  • Overview - Status, statistics, and recent sync activity
  • Libraries - Connected document libraries with individual sync status
  • Documents - Paginated list of indexed documents with search and filters
  • Sync History - Log of past sync operations
  • Settings - Edit configuration or delete the source

Triggering a Manual Sync

  1. Open the source detail page
  2. Click Sync Now in the header
  3. Choose sync type:
    • Delta - Process only new and changed documents (faster)
    • Full - Re-process all documents (use when troubleshooting)

Pausing and Resuming Sync

To temporarily stop scheduled syncs:

  1. Open the source detail page
  2. Click Pause to stop scheduled syncs
  3. Click Resume to restart scheduled syncs

Pausing does not affect the indexed documents or delete any data.

Viewing Indexed Documents

The Documents tab shows all indexed documents with:

ColumnDescription
File NameDocument name with link to SharePoint
Site/LibraryWhere the document is stored
File TypeDocument format
AuthorWho created the document
ModifiedWhen the document was last changed
StatusIndex status (indexed, pending, failed)

Use filters to find specific documents by library, file type, or status.

Re-indexing a Document

If a document's index entry seems outdated or incorrect:

  1. Find the document in the Documents tab
  2. Click the document row to open details
  3. Click Reindex to queue the document for re-processing

AI-Powered Document Summaries

When Generate Summaries is enabled, each document is processed by an AI model to generate:

FieldDescription
Summary2-3 sentence description of the document's content
Keywords5-10 relevant search terms
CategoryClassification (policy, procedure, report, template, reference, presentation, spreadsheet, other)

These AI-generated fields significantly improve search relevance, especially for long documents where full-text search may return poor matches.

Sync Frequencies

FrequencyWhen It RunsBest For
HourlyEvery hourFrequently updated document libraries
Every 6 Hours4 times per dayModerately active libraries
DailyOnce per day (2 AM UTC)Stable reference documentation

Start with Daily sync and increase frequency only if agents report stale results.

Delta Sync Architecture

SharePoint Knowledge Sources use Microsoft Graph delta queries for efficient synchronization:

  • Initial sync downloads and processes all matching documents
  • Subsequent syncs only process documents that have changed since the last sync
  • Per-library tokens track changes independently for each document library
  • Automatic retry handles transient failures with exponential backoff

This approach minimizes API calls and processing time while keeping your knowledge base current.

Best Practices

Start Focused

  1. Begin with 1-2 high-value libraries (e.g., HR Policies, Product Documentation)
  2. Enable AI summaries to maximize search quality
  3. Use Daily sync to minimize processing load
  4. Expand to additional libraries as you confirm value

Organize for Discoverability

  1. Ensure documents have meaningful titles (not "Document1.docx")
  2. Store related documents in dedicated libraries with clear names
  3. Use folders to organize within libraries (paths are indexed)
  4. Keep file sizes reasonable (under 5 MB)

Content Quality

  1. Focus on authoritative, reference-quality documents
  2. Avoid indexing drafts, personal notes, or obsolete content
  3. Archive outdated documents to excluded folders
  4. Consider separate sources for different content types

Security Considerations

  1. SharePoint access uses application-level permissions, not user delegation
  2. All indexed content is accessible to AI agents for your entire organization
  3. Do not index document libraries containing sensitive or restricted content
  4. Review indexed documents periodically to ensure appropriateness

Monitoring and Troubleshooting

Sync Activity

Monitor sync operations from Build > Knowledge > Scan Activity:

  • View running, completed, and failed syncs
  • See document counts (added, updated, deleted, failed)
  • Review error messages for failed operations

Common Issues

Documents Not Being Indexed

  1. Verify the file type is in the configured filter
  2. Check the file size is under 5 MB
  3. Ensure the document library is selected
  4. Review the sync history for errors

Slow Sync Performance

  1. Large initial syncs may take time; subsequent delta syncs are faster
  2. Reduce the number of libraries if processing is taking too long
  3. Check for rate limiting in sync history (Graph API has limits)

Search Results Not Finding Documents

  1. Verify the sync has completed successfully
  2. Check the document is in the "indexed" status
  3. Allow a few minutes for search index propagation
  4. Try searching with different terms from the document title or content

AI Summary Not Generated

  1. Confirm Generate Summaries is enabled in settings
  2. Very short documents may not have meaningful summaries
  3. Documents with extraction errors skip summarization

Agent Integration

AI agents access SharePoint documents through the Grounded Knowledge Index tool infrastructure. When an agent needs to search corporate documents:

  1. Agent uses the search_knowledge_index tool (or similar GKI tool)
  2. Query is executed against Azure AI Search
  3. Relevant documents are returned with:
    • Title and summary
    • Relevance score
    • Direct link to SharePoint document
  4. Agent cites sources in its response

No additional configuration is needed. Once documents are indexed, they're automatically available to all agents with access to the target GKI.