Documentation — LLM Sanitizer

Quick Start

Get up and running with LLM Sanitizer in four steps.

Step 1 — Create an Account

curl -X POST https://api.llmsanitizer.com/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "your-password"}'

Step 2 — Get Your API Key

# Login
curl -X POST https://api.llmsanitizer.com/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "your-password"}'

# Create API key (use JWT from login response)
curl -X POST https://api.llmsanitizer.com/api/v1/keys \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app-key"}'

Step 3 — Configure a Policy

Optionally create a custom security policy. The default policy uses balanced mode.

curl -X POST https://api.llmsanitizer.com/api/v1/user-policies \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "strict-no-pii",
    "mode": "strict",
    "categories": {
      "prompt_injection": true,
      "pii": true,
      "profanity": false
    }
  }'

Step 4 — Sanitize Your First Input

curl -X POST https://api.llmsanitizer.com/proxy/v1/sanitize \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"input": "Hello, can you help me with my project?"}'

Authentication

LLM Sanitizer uses API key authentication for all proxy and sanitization endpoints. Include your API key in the X-API-Key header with every request.

Header Format

X-API-Key: sk_live_abc123def456...

API keys are scoped to your account and can be managed via the dashboard or the /api/v1/keys endpoints. Each key can be named for easy identification and revoked independently.

JWT Authentication

Account management endpoints (creating keys, managing policies) require JWT authentication. Obtain a JWT by logging in via /api/v1/auth/login and pass it in the Authorization: Bearer header.

Endpoints

POST /proxy/v1/sanitize

Sanitize user input before sending to your LLM. Returns a risk assessment with category breakdowns.

Request Body:

inputstring, required
The user message to sanitize.
policystring, optional
Policy mode: "strict", "balanced", or "permissive". Default: "balanced".
policyIdstring, optional
ID of a custom policy to use instead of a preset mode.

Response:

{
  "allowed": true,
  "risk": "low",
  "score": 0.12,
  "categories": [],
  "sanitizedInput": "Hello, can you help me?",
  "piiDetected": [],
  "processingMs": 3.8
}

POST /proxy/v1/sanitize/output

Validate LLM output before showing to users. Detects system prompt leaks, harmful content, and PII in responses.

outputstring, required
The LLM response to validate.
policystring, optional
Policy mode. Default: "balanced".

POST /proxy/v1/chat

Proxy endpoint that sanitizes input, forwards to your LLM provider, then validates the output. A complete round-trip protection pipeline.

messagesarray, required
Chat messages array in OpenAI format.
modelstring, optional
LLM model to use. Default: "gpt-4".

Policies

Security policies control which threat categories are active and how sensitive detection should be. Three preset modes are available:

Mode	Description
strict	Maximum protection. All categories enabled, lowest thresholds. Best for customer-facing applications.
balanced	Default. Good detection with fewer false positives. Suitable for most applications.
permissive	Minimal blocking. Only high-confidence threats are flagged. For internal tools and testing.

Threat Categories

Each category can be individually enabled or disabled in custom policies:

Category	Description
prompt_injection	Override instructions, context switching, instruction manipulation
jailbreak	DAN prompts, developer mode, character roleplay exploits
system_prompt_extraction	Attempts to reveal system prompt or configuration
pii	Emails, SSNs, credit cards, phone numbers, addresses
profanity	Obscene language, vulgarities, slurs
hate_speech	Discriminatory, racist, or bigoted content
threats	Threats of violence, intimidation, harm
harassment	Bullying, personal attacks, targeted abuse
sexual_content	Explicit sexual material, NSFW content
criminal	Illegal activities, drug manufacturing, weapon instructions
self_harm	Suicide, self-injury, eating disorder promotion
misinformation	Demonstrably false claims, conspiracy theories
social_engineering	Manipulation, phishing, confidence tricks
encoding_attacks	Base64, hex, Unicode obfuscation of malicious content
multilingual_injection	Injection attempts in non-English languages
data_exfiltration	Attempts to extract training data or model information
toxicity	General toxic or abusive language
spam	Repetitive, promotional, or meaningless content

PII Types

LLM Sanitizer detects and optionally redacts the following personally identifiable information:

Type	Description
email	Email addresses (user@example.com)
ssn	US Social Security Numbers (XXX-XX-XXXX)
credit_card	Credit/debit card numbers (Visa, Mastercard, Amex, etc.)
phone	Phone numbers (US, international formats)
api_key	API keys and secrets (AWS, OpenAI, Stripe, etc.)
ip_address	IPv4 and IPv6 addresses
address	Physical mailing addresses
date_of_birth	Birth dates in common formats
passport	Passport numbers
drivers_license	Driver's license numbers

When PII is detected, the response includes a piiDetected array with type, location, and redacted value. If using the strict policy, inputs containing PII are blocked by default.

Response Format

All sanitization endpoints return a consistent JSON response:

{
  "allowed": false,
  "risk": "critical",
  "score": 0.94,
  "categories": [
    {
      "name": "prompt_injection",
      "score": 0.94,
      "severity": "critical",
      "details": "Override instruction pattern detected"
    }
  ],
  "sanitizedInput": null,
  "piiDetected": [],
  "message": "Input blocked: prompt injection detected",
  "processingMs": 4.2
}

Field Reference

allowedboolean
Whether the input passed all policy checks.
riskstring
Risk level: "none", "low", "medium", "high", "critical".
scorenumber
Overall risk score from 0.0 (safe) to 1.0 (maximum risk).
categoriesarray
Detected threat categories with individual scores and details.
sanitizedInputstring | null
The cleaned input with PII redacted. Null if input was blocked.
piiDetectedarray
List of PII items found, with type and redacted values.
processingMsnumber
Processing time in milliseconds.

Error Codes

Code	Description
400	Bad Request. Invalid JSON, missing required fields, or malformed input. Check the error message for details.
401	Unauthorized. Missing or invalid API key. Ensure your X-API-Key header is present and correct.
403	Forbidden. Your API key does not have permission for this operation or your account is suspended.
429	Rate Limited. You have exceeded your plan's request limit. Upgrade your plan or wait for the limit to reset.
500	Internal Server Error. An unexpected error occurred. If this persists, contact support with the request ID from the response headers.

All error responses follow this format:

{
  "error": "Rate Limited",
  "message": "You have exceeded 1000 requests this month. Upgrade to Pro for higher limits.",
  "code": 429
}