DataSanitation Processor
Overview
The DataSanitation
processor helps scrub sensitive information from incoming requests. It provides two approaches for configuration: either by specifying entities to inspect (whitelist-style) or entities to exclude (blacklist-style). This processor is ideal for ensuring that sensitive data such as credit card numbers, phone numbers, and email addresses are not passed downstream.
Input and Output
This processor operates on the request stream:
- Input Stream:
Request
– It intercepts and sanitizes the request body before forwarding it to downstream processors. - Output Stream:
Request
– After sanitization, the request continues through the processing pipeline.
This ensures sensitive fields are removed or masked before any further processing or logging occurs.
Parameters
blocklisted_entities
Type: list_of_strings
Required: False
Default: [CreditCard, Email, Phone]
A list of specific entity types to be scrubbed from the request. When this parameter is used, only the listed entities will be scrubbed (whitelist behavior).
Example:
- key: blocklisted_entities
value:
- CreditCard
- Email
- Phone
ignored_entities
Type: list_of_strings
Required: False
A list of entity types that should be excluded from scrubbing. When this parameter is used, all entities except the ones listed here will be scrubbed (blacklist behavior).
Example:
- key: ignored_entities
value:
- Phone
Best Practices
- Use
blocklisted_entities
to explicitly control which fields get sanitized. - Use
ignored_entities
when you need to scrub everything except specific fields. - Always verify the output to ensure no sensitive data is leaked inadvertently.
DataSanitation Processor Template
SanitizeRequest:
processor: DataSanitation
parameters:
- key: blocklisted_entities
value:
- CreditCard
- Email
- Phone
Use Case
You can use the DataSanitation
processor to sanitize OpenAI requests. In this configuration, all entities except IPAddress
will be sanitized before the request reaches the OpenAI provider.
OpenAISanitization:
processor: DataSanitation
parameters:
- key: ignored_entities
value:
- IPAddress