Safety Settings
The Safety Settings page allows you to configure how the Safety Gateway analyzes and handles outbound communications from your AI agents.
Accessing Safety Settings
- Navigate to Build > Governance > Safety Gateway
- Click on the Settings tab
Gateway Mode
The most important setting controls whether the gateway actively protects your communications. The three gateway modes are displayed as selectable cards in a single row. Click a card to activate that mode—the selected mode is highlighted with a blue border.
Disabled
- Gateway is completely bypassed
- All outbound communications execute immediately
- No analysis or logging occurs
- Use case: Initial testing, debugging agent behavior
Dry Run
- All communications are allowed through
- Every message is analyzed and scored
- Decisions are logged but not enforced
- Audit log shows what would have happened
- Use case: Tuning thresholds, understanding traffic patterns
Enforced
- Full protection active
- High-risk messages are held or denied
- Human review required for held items
- Use case: Production environments
Risk Thresholds
Risk thresholds determine when messages are held for review or denied. Thresholds are set separately for internal and external recipients.
Internal Recipient Threshold
Messages to recipients within your organization (same email domain, known employees) use this threshold.
- Default: 0.7
- Range: 0.0 - 1.0
- Higher value = More permissive (fewer holds)
- Lower value = More conservative (more holds)
Internal communications typically warrant higher thresholds since:
- Recipients are trusted employees
- Less risk of external data exposure
- Internal context is usually understood
External Recipient Threshold
Messages to recipients outside your organization use this threshold.
- Default: 0.5
- Range: 0.0 - 1.0
- Higher value = More permissive
- Lower value = More conservative
External communications typically warrant lower thresholds since:
- Higher risk of data exposure
- Recipients may not have context
- Regulatory compliance concerns
Deny Threshold
Messages above this score are automatically denied without human review.
- Default: 0.9
- Range: 0.0 - 1.0
- Use case: Block obviously dangerous content immediately
LLM Analysis Settings
Enable LLM Analysis
Toggle whether the Safety Gateway uses an LLM to analyze message content beyond PII pattern matching.
- Enabled: Full semantic analysis of message content, intent, and appropriateness
- Disabled: Only PII pattern matching (faster but less comprehensive)
LLM Provider
Select which LLM provider to use for safety analysis:
- Anthropic (Claude)
- OpenAI (GPT-4)
- Azure OpenAI
- Azure AI Foundry
LLM Model
Select the specific model for analysis. Faster, cheaper models are recommended since analysis is high-volume.
Recommended models:
- Claude 3 Haiku (Anthropic)
- GPT-4o-mini (OpenAI/Azure)
Skip LLM for Internal
When enabled, skips the LLM analysis step for messages to internal recipients. Uses only PII scanning for faster processing.
- Enabled: Faster internal message processing
- Disabled: Full analysis for all messages
Custom Safety Instructions
Configure custom instructions that guide the LLM's safety analysis. These instructions are included in the prompt sent to the safety analysis LLM, allowing you to define organization-specific policies and guidelines.
Writing Custom Instructions
The custom instructions field accepts free-form text that will be added to the safety analysis prompt. Use it to:
- Define organization-specific sensitivity policies
- Specify approved topics and communication styles
- List known safe patterns that shouldn't trigger flags
- Clarify internal jargon or acronyms that might seem suspicious
Example Instructions
Our company frequently discusses:
- Patient health data (we are a healthcare company) - flag if sent externally
- Financial projections marked "confidential" - always hold for external recipients
- Project codenames (Project Apollo, Project Zeus) - these are not sensitive
Do not flag:
- Internal scheduling emails between team members
- Meeting confirmations and calendar updates
- Standard HR announcements
Tips
- Be specific about what should and shouldn't trigger holds
- Include examples of legitimate communications that get false-flagged
- Update instructions as you identify new patterns in the audit log
- Keep instructions concise - overly long instructions may reduce effectiveness
PII Detection Options
Enabled PII Types
Select which PII types to scan for:
- Credit Card Numbers
- Social Security Numbers (SSN)
- Bank Account / Routing Numbers
- Dates of Birth
- Passport Numbers
- Driver's License Numbers
Auto-Mask PII
When enabled, detected PII in held messages is automatically masked in the review queue (e.g., ****-****-****-1234).
- Enabled: Reviewers see masked values
- Disabled: Reviewers see full values
Emergency Override
The emergency override allows administrators to temporarily bypass the Safety Gateway for all communications.
Enable Emergency Override
When activated:
- All messages are allowed through immediately
- Gateway continues to log for audit purposes
- Useful for urgent situations where review delays are unacceptable
Override Duration
Set how long the emergency override remains active:
- 1 hour
- 4 hours
- 24 hours
- Manual disable
Override Reason
Document why the override was activated. Required field for audit compliance (minimum 20 characters).
Rate Limiting
Emergency override activation is rate-limited to 3 requests per hour per user to prevent accidental repeated activations. This limit applies to both activation and deactivation.
Test Configuration
The Safety Settings page includes a Run Test feature that lets you test your configuration with sample messages before deploying to production.
Using the Test Feature
- Enter a sample message in the test area
- Optionally add recipient email addresses
- Select the communication channel (email, SMS, etc.)
- Click Run Test
Understanding Test Results
The test will show you:
- Decision: What action the gateway would take (Allow, Hold, Deny)
- Danger Score: The calculated risk score (0.0 - 1.0)
- Classification: Whether recipients are internal or external
- Threshold Applied: Which threshold was used for the decision
- Flags: Any risk flags detected by the LLM
- Analysis: The LLM's reasoning for the score
PII Detection in Tests
If PII is detected, the test results show:
- PII Detected: Types of PII found (credit card, SSN, etc.)
- Masked Message: How the message would appear after masking
LLM Details
Click View LLM Details to see:
- The exact prompt sent to the LLM
- The raw LLM response
This helps debug unexpected scoring or refine your safety instructions.
Rate Limiting
The test endpoint is rate-limited to 10 requests per minute per user to prevent abuse. If you exceed this limit, wait briefly before running additional tests.
Test Best Practices
- Test with realistic message content
- Include edge cases (borderline content)
- Test both internal and external recipients
- Verify PII detection with sample data
- Document expected vs. actual results
Notification Settings
Queue Alert Threshold
Send notifications when the review queue exceeds this many pending items.
- Default: 10
- Range: 1 - 100
Alert Recipients
Email addresses to notify when queue threshold is exceeded.
Daily Summary
Send a daily summary email with:
- Total messages processed
- Messages held for review
- Messages denied
- Average danger scores
- Top flags detected
Best Practices
Initial Configuration
- Set gateway mode to Dry Run
- Use default thresholds initially
- Enable LLM analysis with a fast model
- Run for 1-2 weeks to gather data
- Review audit log to understand patterns
- Adjust thresholds based on observed behavior
- Switch to Enforced mode
Threshold Tuning
| Scenario | Internal Threshold | External Threshold |
|---|---|---|
| High Security | 0.5 | 0.3 |
| Balanced | 0.7 | 0.5 |
| High Throughput | 0.85 | 0.7 |
Model Selection
For safety analysis, prioritize:
- Speed - Analysis happens on every outbound message
- Cost - High volume means costs add up
- Reliability - Consistent scoring is important
Fast, affordable models like Claude Haiku or GPT-4o-mini typically perform as well as larger models for this use case.
Troubleshooting
Too Many Messages Held
- Increase internal/external thresholds
- Enable "Skip LLM for Internal"
- Review common flags and adjust if false positives
Messages Not Being Caught
- Lower thresholds
- Ensure LLM analysis is enabled
- Check that relevant PII types are enabled
- Review audit log for patterns
High Latency
- Use a faster LLM model
- Enable "Skip LLM for Internal"
- Check LLM provider status
Related Topics
- Safety Gateway Overview - Understand how the gateway works
- Review Queue - Handle held messages
- Audit Log - View historical decisions