Safety Gateway Overview

The Safety Gateway is an LLM-powered security layer that analyzes all outbound communications from AI agents before they are sent. It helps prevent sensitive information leakage, inappropriate responses, and other risks associated with automated email communications.

Purpose

When AI agents process emails and generate responses, there's always a risk that:

Sensitive data (credit cards, SSNs, bank accounts) could be accidentally included
Confidential business information could be shared with external parties
Responses could be inappropriate for the recipient or context
PII (Personally Identifiable Information) could leak to unauthorized recipients

The Safety Gateway provides a comprehensive defense against these risks through:

PII Detection - Regex-based scanning for credit cards, SSNs, bank accounts, DOBs, passports, and driver's licenses
LLM Analysis - AI-powered content analysis to detect subtle risks that pattern matching misses
Custom Instructions - Organization-specific policies and guidelines for the safety analysis
Recipient Classification - Automatic identification of internal vs. external recipients
Human Review Queue - Suspicious messages are held for human approval before sending
Analytics Dashboard - Interactive charts showing decision trends, top flags, and classification breakdowns

How It Works

The Safety Gateway intercepts outbound communications at the tool execution layer:

Agent generates response - The AI agent decides to send an email, SMS, or other communication
Tool execution intercepted - Before reply_email, send_sms, etc. execute, the Safety Gateway analyzes the content
Safety analysis performed:
- PII patterns scanned (credit cards, SSNs, etc.)
- LLM analyzes content for sensitive information
- Recipients classified as internal or external
- Risk score calculated (0.0 - 1.0)
Decision made:
- Allow - Low risk, tool executes immediately
- Hold - Medium risk, queued for human review
- Deny - High risk, tool execution blocked

Agent Generates Response
         |
         v
    Tool Execution
   (reply_email, etc.)
         |
         v
+---------------------+
|   Safety Gateway    |
|                     |
|  * PII Scanning     |
|  * LLM Analysis     |
|  * Risk Scoring     |
+---------------------+
         |
         v
    Risk Decision
       /   \
      v     v
   Allow   Hold/Deny
     |        |
     v        v
  Execute   Queue for
    Tool    Review

Decision Types

The Safety Gateway makes one of four decisions for each outbound communication:

Decision	Action	When Used
Allow	Message is sent immediately	Low risk score, no PII detected, internal recipients
Hold	Queued for human review	Medium risk score, uncertain content, external recipients
Deny	Message is blocked	High risk score, clear policy violation
Allow (Dry Run)	Message sent, logged for review	Gateway in observation mode

Risk Scoring

Each message receives a danger score from 0.0 to 1.0:

0.0 - 0.3: Low risk - typically auto-approved
0.3 - 0.6: Medium risk - may require review depending on recipient type
0.6 - 0.8: High risk - typically held for review
0.8 - 1.0: Critical risk - may be auto-denied

Risk thresholds are configurable separately for:

Internal recipients (within your organization)
External recipients (outside your organization)

Gateway Modes

The Safety Gateway operates in one of three modes:

Disabled

Safety Gateway is completely bypassed
All outbound communications execute immediately
Use only for testing or when confident in agent behavior

Dry Run (Observation)

All communications are allowed through
Every message is analyzed and logged
Helps tune thresholds before enabling enforcement
Shows "simulated decision" in audit log

Enforced

Full protection enabled
High-risk messages are held or denied
Human review required for held items
Recommended for production use

Best Practices

Initial Setup

Start with Dry Run mode to observe what would be flagged
Review the audit log to understand your traffic patterns
Adjust thresholds based on your risk tolerance
Enable Enforced mode when confident in configuration

Threshold Configuration

Conservative (Lower thresholds): More messages held for review, higher overhead but safer
Permissive (Higher thresholds): Fewer interruptions, faster processing but higher risk
Asymmetric: Lower thresholds for external recipients, higher for internal

Queue Management

Review pending items promptly to avoid blocking agent workflows
Use bulk actions for similar items
Add reviewer notes to build institutional knowledge
Consider time-sensitive flags for urgent communications

Technical Details

Intercepted Tools

The Safety Gateway intercepts these outbound communication tools:

reply_email - Email replies
reply_all_email - Reply-all emails
forward_email - Email forwarding
create_draft_reply - Draft email creation
send_sms - SMS/text messages
notify_user - User notifications
schedule_meeting - Meeting invitations
email_manager_and_wait - Manager escalations

Recipient Classification

The Safety Gateway automatically classifies recipients as internal or external using:

Company Domains - Primary lookup against registered company domains in the Companies table (including alternate domains)
Employee Directory - Secondary lookup if domain not found; requires 2+ employees with matching email domain
Domain Heuristics - Falls back to treating unknown domains as external

To ensure accurate internal classification:

Register all company domains in the Companies settings
Include alternate domains (e.g., subsidiary domains, legacy domains)
Sync your employee directory regularly

PII Types Detected

Type	Examples
Credit Card	4111-1111-1111-1111, 5500 0000 0000 0004
SSN	123-45-6789, 123 45 6789
Bank Account	Routing + account number patterns
Date of Birth	DOB: 01/15/1990, born on 1990-01-15
Passport	Passport: 123456789, Passport#AB1234567
Driver's License	DL: D12345678, License: ABC12345

PII Masking

When PII masking is enabled (default), detected PII is automatically masked before sending:

Credit cards: ****-****-****-1234 (last 4 digits preserved)
SSNs: ***-**-6789 (last 4 digits preserved)
Bank accounts: Masked with partial visibility
Other PII: Replaced with [REDACTED]

This adds an extra layer of protection even if a message is approved for sending.

LLM Analysis

The Safety Gateway uses an LLM to analyze message content for risks that pattern matching cannot detect:

Confidential business information
Inappropriate tone or language
Context-inappropriate disclosures
Policy violations
Potential phishing or social engineering
Unusual requests or instructions

The LLM returns:

Danger Score (0.0 - 1.0)
Analysis Summary explaining the reasoning
Flags indicating specific risk types detected

You can configure which LLM model to use for analysis in Safety Settings.

Storage

Review Queue: SQL Server (SafetyReviewQueue table)
Audit Log: Azure Table Storage (high-volume, cost-effective)
Settings: SQL Server (TenantSafetySettings table)

Emergency Override

In urgent situations, administrators can activate an Emergency Override to temporarily disable the Safety Gateway:

Requires safety:emergency:override permission
Must provide a reason (minimum 20 characters)
Duration: 15-240 minutes
Automatically reverts to previous mode when expired
All overrides are logged for compliance
Rate-limited to 3 activations per hour per user

Safety Settings - Configure the gateway
Review Queue - Handle held messages
Audit Log - View historical decisions
Agent Configuration - Create and manage agents

Purpose​

How It Works​

Decision Types​

Risk Scoring​

Gateway Modes​

Disabled​

Dry Run (Observation)​

Enforced​

Pages​

Best Practices​

Initial Setup​

Threshold Configuration​

Queue Management​

Technical Details​

Intercepted Tools​

Recipient Classification​

PII Types Detected​

PII Masking​

LLM Analysis​

Storage​

Emergency Override​

Related Topics​