Skip to main content

Safety Settings

The Safety Settings page allows you to configure how the Safety Gateway analyzes and handles outbound communications from your AI agents.

Accessing Safety Settings

  1. Navigate to Build > Governance > Safety Gateway
  2. Click on the Settings tab

Gateway Mode

The most important setting controls whether the gateway actively protects your communications. The three gateway modes are displayed as selectable cards in a single row. Click a card to activate that mode—the selected mode is highlighted with a blue border.

Disabled

  • Gateway is completely bypassed
  • All outbound communications execute immediately
  • No analysis or logging occurs
  • Use case: Initial testing, debugging agent behavior

Dry Run

  • All communications are allowed through
  • Every message is analyzed and scored
  • Decisions are logged but not enforced
  • Audit log shows what would have happened
  • Use case: Tuning thresholds, understanding traffic patterns

Enforced

  • Full protection active
  • High-risk messages are held or denied
  • Human review required for held items
  • Use case: Production environments

Risk Thresholds

Risk thresholds determine when messages are held for review or denied. Thresholds are set separately for internal and external recipients.

Internal Recipient Threshold

Messages to recipients within your organization (same email domain, known employees) use this threshold.

  • Default: 0.7
  • Range: 0.0 - 1.0
  • Higher value = More permissive (fewer holds)
  • Lower value = More conservative (more holds)

Internal communications typically warrant higher thresholds since:

  • Recipients are trusted employees
  • Less risk of external data exposure
  • Internal context is usually understood

External Recipient Threshold

Messages to recipients outside your organization use this threshold.

  • Default: 0.5
  • Range: 0.0 - 1.0
  • Higher value = More permissive
  • Lower value = More conservative

External communications typically warrant lower thresholds since:

  • Higher risk of data exposure
  • Recipients may not have context
  • Regulatory compliance concerns

Deny Threshold

Messages above this score are automatically denied without human review.

  • Default: 0.9
  • Range: 0.0 - 1.0
  • Use case: Block obviously dangerous content immediately

LLM Analysis Settings

Enable LLM Analysis

Toggle whether the Safety Gateway uses an LLM to analyze message content beyond PII pattern matching.

  • Enabled: Full semantic analysis of message content, intent, and appropriateness
  • Disabled: Only PII pattern matching (faster but less comprehensive)

LLM Provider

Select which LLM provider to use for safety analysis:

  • Anthropic (Claude)
  • OpenAI (GPT-4)
  • Azure OpenAI
  • Azure AI Foundry

LLM Model

Select the specific model for analysis. Faster, cheaper models are recommended since analysis is high-volume.

Recommended models:

  • Claude 3 Haiku (Anthropic)
  • GPT-4o-mini (OpenAI/Azure)

Skip LLM for Internal

When enabled, skips the LLM analysis step for messages to internal recipients. Uses only PII scanning for faster processing.

  • Enabled: Faster internal message processing
  • Disabled: Full analysis for all messages

Custom Safety Instructions

Configure custom instructions that guide the LLM's safety analysis. These instructions are included in the prompt sent to the safety analysis LLM, allowing you to define organization-specific policies and guidelines.

Writing Custom Instructions

The custom instructions field accepts free-form text that will be added to the safety analysis prompt. Use it to:

  • Define organization-specific sensitivity policies
  • Specify approved topics and communication styles
  • List known safe patterns that shouldn't trigger flags
  • Clarify internal jargon or acronyms that might seem suspicious

Example Instructions

Our company frequently discusses:
- Patient health data (we are a healthcare company) - flag if sent externally
- Financial projections marked "confidential" - always hold for external recipients
- Project codenames (Project Apollo, Project Zeus) - these are not sensitive

Do not flag:
- Internal scheduling emails between team members
- Meeting confirmations and calendar updates
- Standard HR announcements

Tips

  • Be specific about what should and shouldn't trigger holds
  • Include examples of legitimate communications that get false-flagged
  • Update instructions as you identify new patterns in the audit log
  • Keep instructions concise - overly long instructions may reduce effectiveness

PII Detection Options

Enabled PII Types

Select which PII types to scan for:

  • Credit Card Numbers
  • Social Security Numbers (SSN)
  • Bank Account / Routing Numbers
  • Dates of Birth
  • Passport Numbers
  • Driver's License Numbers

Auto-Mask PII

When enabled, detected PII in held messages is automatically masked in the review queue (e.g., ****-****-****-1234).

  • Enabled: Reviewers see masked values
  • Disabled: Reviewers see full values

Emergency Override

The emergency override allows administrators to temporarily bypass the Safety Gateway for all communications.

Enable Emergency Override

When activated:

  • All messages are allowed through immediately
  • Gateway continues to log for audit purposes
  • Useful for urgent situations where review delays are unacceptable

Override Duration

Set how long the emergency override remains active:

  • 1 hour
  • 4 hours
  • 24 hours
  • Manual disable

Override Reason

Document why the override was activated. Required field for audit compliance (minimum 20 characters).

Rate Limiting

Emergency override activation is rate-limited to 3 requests per hour per user to prevent accidental repeated activations. This limit applies to both activation and deactivation.

Test Configuration

The Safety Settings page includes a Run Test feature that lets you test your configuration with sample messages before deploying to production.

Using the Test Feature

  1. Enter a sample message in the test area
  2. Optionally add recipient email addresses
  3. Select the communication channel (email, SMS, etc.)
  4. Click Run Test

Understanding Test Results

The test will show you:

  • Decision: What action the gateway would take (Allow, Hold, Deny)
  • Danger Score: The calculated risk score (0.0 - 1.0)
  • Classification: Whether recipients are internal or external
  • Threshold Applied: Which threshold was used for the decision
  • Flags: Any risk flags detected by the LLM
  • Analysis: The LLM's reasoning for the score

PII Detection in Tests

If PII is detected, the test results show:

  • PII Detected: Types of PII found (credit card, SSN, etc.)
  • Masked Message: How the message would appear after masking

LLM Details

Click View LLM Details to see:

  • The exact prompt sent to the LLM
  • The raw LLM response

This helps debug unexpected scoring or refine your safety instructions.

Rate Limiting

The test endpoint is rate-limited to 10 requests per minute per user to prevent abuse. If you exceed this limit, wait briefly before running additional tests.

Test Best Practices

  1. Test with realistic message content
  2. Include edge cases (borderline content)
  3. Test both internal and external recipients
  4. Verify PII detection with sample data
  5. Document expected vs. actual results

Notification Settings

Queue Alert Threshold

Send notifications when the review queue exceeds this many pending items.

  • Default: 10
  • Range: 1 - 100

Alert Recipients

Email addresses to notify when queue threshold is exceeded.

Daily Summary

Send a daily summary email with:

  • Total messages processed
  • Messages held for review
  • Messages denied
  • Average danger scores
  • Top flags detected

Best Practices

Initial Configuration

  1. Set gateway mode to Dry Run
  2. Use default thresholds initially
  3. Enable LLM analysis with a fast model
  4. Run for 1-2 weeks to gather data
  5. Review audit log to understand patterns
  6. Adjust thresholds based on observed behavior
  7. Switch to Enforced mode

Threshold Tuning

ScenarioInternal ThresholdExternal Threshold
High Security0.50.3
Balanced0.70.5
High Throughput0.850.7

Model Selection

For safety analysis, prioritize:

  1. Speed - Analysis happens on every outbound message
  2. Cost - High volume means costs add up
  3. Reliability - Consistent scoring is important

Fast, affordable models like Claude Haiku or GPT-4o-mini typically perform as well as larger models for this use case.

Troubleshooting

Too Many Messages Held

  • Increase internal/external thresholds
  • Enable "Skip LLM for Internal"
  • Review common flags and adjust if false positives

Messages Not Being Caught

  • Lower thresholds
  • Ensure LLM analysis is enabled
  • Check that relevant PII types are enabled
  • Review audit log for patterns

High Latency

  • Use a faster LLM model
  • Enable "Skip LLM for Internal"
  • Check LLM provider status