Threat model

Be explicit about what AI agent security does and does not cover.

Pre-product security claims must be precise. The first AgentGuard target is the honest hallucinating agent, not a malicious operator with kernel-bypass intent.

Threat tiers

Tier
Actor
Coverage
T1
Honest hallucinating agent: well-intentioned but unsafe.
V1 runtime controls.
T2
Prompt-injected agent following hostile tool output.
Process sandbox, sealed proxy, provenance.
T3
Malicious agent or operator trying to bypass controls.
Kernel or VM isolation and attestation.

V1 escape vectors to address

  • Direct `.env` reads and indirect shell reads.
  • Subprocess spawn and backgrounded child processes.
  • Network exfiltration to unknown hosts and DNS exfiltration.
  • Secret-containing Git pushes and protected branch pushes.
  • Symlinks or hardlinks that hide protected paths.
  • Encoded or chunked secrets in stdout and audit logs.

V1 non-goals

V1 should not claim protection against a malicious agent that compiles a static binary to issue raw syscalls outside a filter, process injection into another process, kernel-side timing channels, or full Windows enforcement. Naming the limits is part of building trust.

Enforcement layers

Layer 0 is cooperative hooks such as pre-tool-use and post-tool-use signals. Layer 1 is the process wrapper and primary enforcement boundary. Layer 2 is the local network proxy. Layer 3 is deferred kernel-level instrumentation.

If cooperative hooks and the process wrapper disagree, the wrapper must win. Hooks are signal; runtime enforcement is the control.

Security buyers respect precise limits.

Read runtime design