Safety Gateway Overview
The Safety Gateway is an LLM-powered security layer that analyzes all outbound communications from AI agents before they are sent. It helps prevent sensitive information leakage, inappropriate responses, and other risks associated with automated email communications.
Purpose
When AI agents process emails and generate responses, there's always a risk that:
- Sensitive data (credit cards, SSNs, bank accounts) could be accidentally included
- Confidential business information could be shared with external parties
- Responses could be inappropriate for the recipient or context
- PII (Personally Identifiable Information) could leak to unauthorized recipients
The Safety Gateway provides a comprehensive defense against these risks through:
- PII Detection - Regex-based scanning for credit cards, SSNs, bank accounts, DOBs, passports, and driver's licenses
- LLM Analysis - AI-powered content analysis to detect subtle risks that pattern matching misses
- Custom Instructions - Organization-specific policies and guidelines for the safety analysis
- Recipient Classification - Automatic identification of internal vs. external recipients
- Human Review Queue - Suspicious messages are held for human approval before sending
- Analytics Dashboard - Interactive charts showing decision trends, top flags, and classification breakdowns
How It Works
The Safety Gateway intercepts outbound communications at the tool execution layer:
- Agent generates response - The AI agent decides to send an email, SMS, or other communication
- Tool execution intercepted - Before
reply_email,send_sms, etc. execute, the Safety Gateway analyzes the content - Safety analysis performed:
- PII patterns scanned (credit cards, SSNs, etc.)
- LLM analyzes content for sensitive information
- Recipients classified as internal or external
- Risk score calculated (0.0 - 1.0)
- Decision made:
- Allow - Low risk, tool executes immediately
- Hold - Medium risk, queued for human review
- Deny - High risk, tool execution blocked
Agent Generates Response
|
v
Tool Execution
(reply_email, etc.)
|
v
+---------------------+
| Safety Gateway |
| |
| * PII Scanning |
| * LLM Analysis |
| * Risk Scoring |
+---------------------+
|
v
Risk Decision
/ \
v v
Allow Hold/Deny
| |
v v
Execute Queue for
Tool Review
Decision Types
The Safety Gateway makes one of four decisions for each outbound communication:
| Decision | Action | When Used |
|---|---|---|
| Allow | Message is sent immediately | Low risk score, no PII detected, internal recipients |
| Hold | Queued for human review | Medium risk score, uncertain content, external recipients |
| Deny | Message is blocked | High risk score, clear policy violation |
| Allow (Dry Run) | Message sent, logged for review | Gateway in observation mode |
Risk Scoring
Each message receives a danger score from 0.0 to 1.0:
- 0.0 - 0.3: Low risk - typically auto-approved
- 0.3 - 0.6: Medium risk - may require review depending on recipient type
- 0.6 - 0.8: High risk - typically held for review
- 0.8 - 1.0: Critical risk - may be auto-denied
Risk thresholds are configurable separately for:
- Internal recipients (within your organization)
- External recipients (outside your organization)
Gateway Modes
The Safety Gateway operates in one of three modes:
Disabled
- Safety Gateway is completely bypassed
- All outbound communications execute immediately
- Use only for testing or when confident in agent behavior
Dry Run (Observation)
- All communications are allowed through
- Every message is analyzed and logged
- Helps tune thresholds before enabling enforcement
- Shows "simulated decision" in audit log
Enforced
- Full protection enabled
- High-risk messages are held or denied
- Human review required for held items
- Recommended for production use
Pages
The Safety Gateway section of the Admin Console includes three pages, all accessible under Build > Governance > Safety Gateway:
- Safety Settings - Configure thresholds, modes, custom instructions, and LLM analysis options
- Review Queue - Approve, deny, or edit held messages with re-scan validation
- Audit Log - View history of all gateway decisions with analytics dashboard
Best Practices
Initial Setup
- Start with Dry Run mode to observe what would be flagged
- Review the audit log to understand your traffic patterns
- Adjust thresholds based on your risk tolerance
- Enable Enforced mode when confident in configuration
Threshold Configuration
- Conservative (Lower thresholds): More messages held for review, higher overhead but safer
- Permissive (Higher thresholds): Fewer interruptions, faster processing but higher risk
- Asymmetric: Lower thresholds for external recipients, higher for internal
Queue Management
- Review pending items promptly to avoid blocking agent workflows
- Use bulk actions for similar items
- Add reviewer notes to build institutional knowledge
- Consider time-sensitive flags for urgent communications
Technical Details
Intercepted Tools
The Safety Gateway intercepts these outbound communication tools:
reply_email- Email repliesreply_all_email- Reply-all emailsforward_email- Email forwardingcreate_draft_reply- Draft email creationsend_sms- SMS/text messagesnotify_user- User notificationsschedule_meeting- Meeting invitationsemail_manager_and_wait- Manager escalations
Recipient Classification
The Safety Gateway automatically classifies recipients as internal or external using:
- Company Domains - Primary lookup against registered company domains in the Companies table (including alternate domains)
- Employee Directory - Secondary lookup if domain not found; requires 2+ employees with matching email domain
- Domain Heuristics - Falls back to treating unknown domains as external
To ensure accurate internal classification:
- Register all company domains in the Companies settings
- Include alternate domains (e.g., subsidiary domains, legacy domains)
- Sync your employee directory regularly
PII Types Detected
| Type | Examples |
|---|---|
| Credit Card | 4111-1111-1111-1111, 5500 0000 0000 0004 |
| SSN | 123-45-6789, 123 45 6789 |
| Bank Account | Routing + account number patterns |
| Date of Birth | DOB: 01/15/1990, born on 1990-01-15 |
| Passport | Passport: 123456789, Passport#AB1234567 |
| Driver's License | DL: D12345678, License: ABC12345 |
PII Masking
When PII masking is enabled (default), detected PII is automatically masked before sending:
- Credit cards:
****-****-****-1234(last 4 digits preserved) - SSNs:
***-**-6789(last 4 digits preserved) - Bank accounts: Masked with partial visibility
- Other PII: Replaced with
[REDACTED]
This adds an extra layer of protection even if a message is approved for sending.
LLM Analysis
The Safety Gateway uses an LLM to analyze message content for risks that pattern matching cannot detect:
- Confidential business information
- Inappropriate tone or language
- Context-inappropriate disclosures
- Policy violations
- Potential phishing or social engineering
- Unusual requests or instructions
The LLM returns:
- Danger Score (0.0 - 1.0)
- Analysis Summary explaining the reasoning
- Flags indicating specific risk types detected
You can configure which LLM model to use for analysis in Safety Settings.
Storage
- Review Queue: SQL Server (SafetyReviewQueue table)
- Audit Log: Azure Table Storage (high-volume, cost-effective)
- Settings: SQL Server (TenantSafetySettings table)
Emergency Override
In urgent situations, administrators can activate an Emergency Override to temporarily disable the Safety Gateway:
- Requires
safety:emergency:overridepermission - Must provide a reason (minimum 20 characters)
- Duration: 15-240 minutes
- Automatically reverts to previous mode when expired
- All overrides are logged for compliance
- Rate-limited to 3 activations per hour per user
Related Topics
- Safety Settings - Configure the gateway
- Review Queue - Handle held messages
- Audit Log - View historical decisions
- Agent Configuration - Create and manage agents