Interface: InjectionPattern
Defined in: guardrails/injection-detection.ts:19
Prompt Injection Detection — Scans input content for manipulation attempts.
Detects patterns in user messages, file contents, web fetches, and knowledge base documents that try to override the agent’s instructions.
Two attack categories:
- Direct: user explicitly tries to override (“ignore previous instructions”)
- Indirect: embedded in external content (files, web pages) that the agent reads
Detection is pattern-based (fast, no LLM call). Not exhaustive, but catches the obvious attacks with low false-positive rates.
Properties
category
category:
| "instruction-override"
| "role-hijack"
| "system-prompt-leak"
| "data-exfiltration";
Defined in: guardrails/injection-detection.ts:29
Category of attack
description
description: string;
Defined in: guardrails/injection-detection.ts:23
Human-readable description
id
id: string;
Defined in: guardrails/injection-detection.ts:21
Unique identifier
pattern
pattern: RegExp;
Defined in: guardrails/injection-detection.ts:25
Regex pattern (case-insensitive)
severity
severity: "low" | "medium" | "high";
Defined in: guardrails/injection-detection.ts:27
Severity: low (suspicious), medium (likely), high (definite)