AI Coding Agent Threat Model

Threat tiers

Tier

Actor

Coverage

Honest hallucinating agent: well-intentioned but unsafe.

V1 runtime controls.

Prompt-injected agent following hostile tool output.

Process sandbox, sealed proxy, provenance.

Malicious agent or operator trying to bypass controls.

Kernel or VM isolation and attestation.

V1 escape vectors to address

Direct `.env` reads and indirect shell reads.
Subprocess spawn and backgrounded child processes.
Network exfiltration to unknown hosts and DNS exfiltration.
Secret-containing Git pushes and protected branch pushes.
Symlinks or hardlinks that hide protected paths.
Encoded or chunked secrets in stdout and audit logs.

V1 non-goals

V1 should not claim protection against a malicious agent that compiles a static binary to issue raw syscalls outside a filter, process injection into another process, kernel-side timing channels, or full Windows enforcement. Naming the limits is part of building trust.

Enforcement layers

Layer 0 is cooperative hooks such as pre-tool-use and post-tool-use signals. Layer 1 is the process wrapper and primary enforcement boundary. Layer 2 is the local network proxy. Layer 3 is deferred kernel-level instrumentation.

If cooperative hooks and the process wrapper disagree, the wrapper must win. Hooks are signal; runtime enforcement is the control.

Be explicit about what AI agent security does and does not cover.

Threat tiers

V1 escape vectors to address

V1 non-goals

Enforcement layers

Security buyers respect precise limits.