Safety Settings

The Safety Settings page allows you to configure how the Safety Gateway analyzes and handles outbound communications from your AI agents.

Accessing Safety Settings

Navigate to Build > Governance > Safety Gateway
Click on the Settings tab

Gateway Mode

The most important setting controls whether the gateway actively protects your communications. The three gateway modes are displayed as selectable cards in a single row. Click a card to activate that mode—the selected mode is highlighted with a blue border.

Disabled

Gateway is completely bypassed
All outbound communications execute immediately
No analysis or logging occurs
Use case: Initial testing, debugging agent behavior

Dry Run

All communications are allowed through
Every message is analyzed and scored
Decisions are logged but not enforced
Audit log shows what would have happened
Use case: Tuning thresholds, understanding traffic patterns

Enforced

Full protection active
High-risk messages are held or denied
Human review required for held items
Use case: Production environments

Risk Thresholds

Risk thresholds determine when messages are held for review or denied. Thresholds are set separately for internal and external recipients.

Internal Recipient Threshold

Messages to recipients within your organization (same email domain, known employees) use this threshold.

Default: 0.7
Range: 0.0 - 1.0
Higher value = More permissive (fewer holds)
Lower value = More conservative (more holds)

Internal communications typically warrant higher thresholds since:

Recipients are trusted employees
Less risk of external data exposure
Internal context is usually understood

External Recipient Threshold

Messages to recipients outside your organization use this threshold.

Default: 0.5
Range: 0.0 - 1.0
Higher value = More permissive
Lower value = More conservative

External communications typically warrant lower thresholds since:

Higher risk of data exposure
Recipients may not have context
Regulatory compliance concerns

Deny Threshold

Messages above this score are automatically denied without human review.

Default: 0.9
Range: 0.0 - 1.0
Use case: Block obviously dangerous content immediately

LLM Analysis Settings

Enable LLM Analysis

Toggle whether the Safety Gateway uses an LLM to analyze message content beyond PII pattern matching.

Enabled: Full semantic analysis of message content, intent, and appropriateness
Disabled: Only PII pattern matching (faster but less comprehensive)

LLM Provider

Select which LLM provider to use for safety analysis:

Anthropic (Claude)
OpenAI (GPT-4)
Azure OpenAI
Azure AI Foundry

LLM Model

Select the specific model for analysis. Faster, cheaper models are recommended since analysis is high-volume.

Recommended models:

Claude 3 Haiku (Anthropic)
GPT-4o-mini (OpenAI/Azure)

Skip LLM for Internal

When enabled, skips the LLM analysis step for messages to internal recipients. Uses only PII scanning for faster processing.

Enabled: Faster internal message processing
Disabled: Full analysis for all messages

Custom Safety Instructions

Configure custom instructions that guide the LLM's safety analysis. These instructions are included in the prompt sent to the safety analysis LLM, allowing you to define organization-specific policies and guidelines.

Writing Custom Instructions

The custom instructions field accepts free-form text that will be added to the safety analysis prompt. Use it to:

Define organization-specific sensitivity policies
Specify approved topics and communication styles
List known safe patterns that shouldn't trigger flags
Clarify internal jargon or acronyms that might seem suspicious

Example Instructions

Our company frequently discusses:
- Patient health data (we are a healthcare company) - flag if sent externally
- Financial projections marked "confidential" - always hold for external recipients
- Project codenames (Project Apollo, Project Zeus) - these are not sensitive

Do not flag:
- Internal scheduling emails between team members
- Meeting confirmations and calendar updates
- Standard HR announcements

Tips

Be specific about what should and shouldn't trigger holds
Include examples of legitimate communications that get false-flagged
Update instructions as you identify new patterns in the audit log
Keep instructions concise - overly long instructions may reduce effectiveness

PII Detection Options

Enabled PII Types

Select which PII types to scan for:

Credit Card Numbers
Social Security Numbers (SSN)
Bank Account / Routing Numbers
Dates of Birth
Passport Numbers
Driver's License Numbers

Auto-Mask PII

When enabled, detected PII in held messages is automatically masked in the review queue (e.g., ****-****-****-1234).

Enabled: Reviewers see masked values
Disabled: Reviewers see full values

Emergency Override

The emergency override allows administrators to temporarily bypass the Safety Gateway for all communications.

Enable Emergency Override

When activated:

All messages are allowed through immediately
Gateway continues to log for audit purposes
Useful for urgent situations where review delays are unacceptable

Override Duration

Set how long the emergency override remains active:

1 hour
4 hours
24 hours
Manual disable

Override Reason

Document why the override was activated. Required field for audit compliance (minimum 20 characters).

Rate Limiting

Emergency override activation is rate-limited to 3 requests per hour per user to prevent accidental repeated activations. This limit applies to both activation and deactivation.

Test Configuration

The Safety Settings page includes a Run Test feature that lets you test your configuration with sample messages before deploying to production.

Using the Test Feature

Enter a sample message in the test area
Optionally add recipient email addresses
Select the communication channel (email, SMS, etc.)
Click Run Test

Understanding Test Results

The test will show you:

Decision: What action the gateway would take (Allow, Hold, Deny)
Danger Score: The calculated risk score (0.0 - 1.0)
Classification: Whether recipients are internal or external
Threshold Applied: Which threshold was used for the decision
Flags: Any risk flags detected by the LLM
Analysis: The LLM's reasoning for the score

PII Detection in Tests

If PII is detected, the test results show:

PII Detected: Types of PII found (credit card, SSN, etc.)
Masked Message: How the message would appear after masking

LLM Details

Click View LLM Details to see:

The exact prompt sent to the LLM
The raw LLM response

This helps debug unexpected scoring or refine your safety instructions.

Rate Limiting

The test endpoint is rate-limited to 10 requests per minute per user to prevent abuse. If you exceed this limit, wait briefly before running additional tests.

Test Best Practices

Test with realistic message content
Include edge cases (borderline content)
Test both internal and external recipients
Verify PII detection with sample data
Document expected vs. actual results

Notification Settings

Queue Alert Threshold

Send notifications when the review queue exceeds this many pending items.

Default: 10
Range: 1 - 100

Alert Recipients

Email addresses to notify when queue threshold is exceeded.

Daily Summary

Send a daily summary email with:

Total messages processed
Messages held for review
Messages denied
Average danger scores
Top flags detected

Best Practices

Initial Configuration

Set gateway mode to Dry Run
Use default thresholds initially
Enable LLM analysis with a fast model
Run for 1-2 weeks to gather data
Review audit log to understand patterns
Adjust thresholds based on observed behavior
Switch to Enforced mode

Threshold Tuning

Scenario	Internal Threshold	External Threshold
High Security	0.5	0.3
Balanced	0.7	0.5
High Throughput	0.85	0.7

Model Selection

For safety analysis, prioritize:

Speed - Analysis happens on every outbound message
Cost - High volume means costs add up
Reliability - Consistent scoring is important

Fast, affordable models like Claude Haiku or GPT-4o-mini typically perform as well as larger models for this use case.

Troubleshooting

Too Many Messages Held

Increase internal/external thresholds
Enable "Skip LLM for Internal"
Review common flags and adjust if false positives

Messages Not Being Caught

Lower thresholds
Ensure LLM analysis is enabled
Check that relevant PII types are enabled
Review audit log for patterns

High Latency

Use a faster LLM model
Enable "Skip LLM for Internal"
Check LLM provider status

Safety Gateway Overview - Understand how the gateway works
Review Queue - Handle held messages
Audit Log - View historical decisions

Accessing Safety Settings​

Gateway Mode​

Disabled​

Dry Run​

Enforced​

Risk Thresholds​

Internal Recipient Threshold​

External Recipient Threshold​

Deny Threshold​

LLM Analysis Settings​

Enable LLM Analysis​

LLM Provider​

LLM Model​

Skip LLM for Internal​

Custom Safety Instructions​

Writing Custom Instructions​

Example Instructions​

Tips​

PII Detection Options​

Enabled PII Types​

Auto-Mask PII​

Emergency Override​

Enable Emergency Override​

Override Duration​

Override Reason​

Rate Limiting​

Test Configuration​

Using the Test Feature​

Understanding Test Results​

PII Detection in Tests​

LLM Details​

Rate Limiting​

Test Best Practices​

Notification Settings​

Queue Alert Threshold​

Alert Recipients​

Daily Summary​

Best Practices​

Initial Configuration​

Threshold Tuning​

Model Selection​

Troubleshooting​

Too Many Messages Held​

Messages Not Being Caught​

High Latency​

Related Topics​

Accessing Safety Settings

Gateway Mode

Disabled

Dry Run

Enforced

Risk Thresholds

Internal Recipient Threshold

External Recipient Threshold

Deny Threshold

LLM Analysis Settings

Enable LLM Analysis

LLM Provider

LLM Model

Skip LLM for Internal

Custom Safety Instructions

Writing Custom Instructions

Example Instructions

Tips

PII Detection Options

Enabled PII Types

Auto-Mask PII

Emergency Override

Enable Emergency Override

Override Duration

Override Reason

Rate Limiting

Test Configuration

Using the Test Feature

Understanding Test Results

PII Detection in Tests

LLM Details

Rate Limiting

Test Best Practices

Notification Settings

Queue Alert Threshold

Alert Recipients

Daily Summary

Best Practices

Initial Configuration

Threshold Tuning

Model Selection

Troubleshooting

Too Many Messages Held

Messages Not Being Caught

High Latency

Related Topics