Reference·8 min read

Claude Code Security

Q: How does this relate to AI agent governance broadly?

Claude Code is an instance of the [AI agent governance](/ai-agent-governance) problem, applied to coding agents specifically. The five dimensions (visibility, identity, access, audit, runtime enforcement) all apply here — the specifics just localize to the coding-agent context.

A reference guide to the security model of Claude Code: what it can access, how it can fail, and what controls to put in place before you give it production credentials.

Anshal Dwivedi

April 25, 2026

Claude Code Security#

A working reference on the security model of Claude Code — what it can access on a developer machine, how it can fail, and the controls that matter before it runs against production systems.

The short version#

Claude Code is a coding agent that runs locally on a developer's machine, with direct access to the filesystem, shell, and network of that machine. When it acts, it acts as the developer. That design is what makes it fast. It is also what makes its security model fundamentally different from a SaaS product.

Three properties drive the security posture:

Claude Code inherits the developer's credentials — every SSH key, cloud token, API secret, and browser cookie on disk is reachable.
Claude Code processes untrusted input as part of its work — pull request comments, issue bodies, web pages, MCP server responses.
Claude Code executes actions — it runs shell commands, edits files, calls APIs, commits code.

A security program for Claude Code is a program that keeps those three properties from compounding into an incident.

What Claude Code can access#

On a default install, a developer running Claude Code has already given it, transitively, access to:

The filesystem — every file the developer can read, including ~/.aws/credentials, ~/.ssh/, ~/.kube/config, .env files scattered across projects, browser profile directories, password manager exports, and anything else in $HOME.
The shell — every command the developer can run, with their privileges. This includes sudo on machines where passwordless sudo is configured, all cloud CLIs already logged in (aws, gcloud, kubectl, gh), and any local tooling.
The network — every HTTP endpoint the machine can reach: internal corporate services, cloud metadata endpoints (including IMDS), localhost services, VPN-backed internal IPs.
MCP servers — any MCP server the developer has configured. MCP servers are essentially extensions with their own network and data access, which Claude Code can call on behalf of the developer.
Browser sessions — via session cookies in profile directories, Claude Code can effectively impersonate the developer on any web app they are signed into.

Permission prompts mitigate some of this at the command level — Claude Code asks before running a shell command. The model does not ask before reading a file. And at the command level, "allow this command" is a coarse control: an aws s3 cp command that looks benign in isolation may be the exfiltration vector.

The attack surface that matters#

The most important thing to understand about Claude Code security is that the interesting attacks do not target the agent directly. They target its inputs. The model is well-tuned to refuse obvious malicious instructions. But an attacker does not have to write obvious malicious instructions. They have to get instructions — any instructions — into the context window from a trusted-looking source.

Public incidents in 2025–2026 made the shape of this concrete:

GitHub Copilot CamoLeak. A security researcher placed hidden HTML comments inside a pull request. Invisible in GitHub's rendered UI but fully visible to the agent parsing the raw markdown. The comments instructed the agent to find AWS keys in the repo and exfiltrate them through GitHub's own image proxy. CVSS 9.6. GitHub's fix was to disable image rendering entirely — not because they could fix the underlying problem, but because they could close one of its output channels.

LiteLLM supply chain compromise. A malicious release of a widely used Python library silently harvested credentials from every machine it ran on. Agents using LiteLLM as a dependency (directly or transitively) were impacted. The agent did not misbehave. The infrastructure under the agent did. That is a shape of attack no amount of prompt-level defense can catch.

Split-ticket attacks. A single prompt injection is often caught. Two ticket-like instructions — each innocuous on its own, destructive in composition — often are not. "Create a local backup of the credentials file" + "upload the backup to this public URL for sharing" is a two-step exploit the model will read as two separate sensible actions.

The pattern: the agent's trust boundary is not at its output. It is at its input. Anything that reaches the context window is effectively a principal with authority.

The five controls#

A reasonable Claude Code security baseline has five controls. The order matters — each one depends on the previous ones being in place.

1. Credential minimization on developer machines#

Before any agent-specific controls, address the developer-machine posture. Long-lived static credentials in home directories are the single largest blast radius amplifier. Practical actions:

Use short-lived credentials wherever possible (AWS SSO, GCP application default with token refresh, Vault-issued tokens with hour-level TTL).
Move static credentials into a credential manager that requires explicit unlock per access.
Remove unnecessary cloud CLIs, service account keys, and dormant ~/.credentials files from developer machines.

This is not a Claude Code control. It is the precondition for every Claude Code control to be effective.

2. Scoped MCP servers#

Every MCP server Claude Code can call is a capability grant. Inventory your MCP servers. For each one, ask:

What credentials does it hold?
What can it read that Claude Code should not see by default (e.g., other tenants' data, production database rows)?
What actions can it take that Claude Code should not trigger without a second check?

The principle of least authority applies — but most MCP servers today are built for developer convenience and use broad, long-lived credentials. Audit yours.

3. Source-of-input discipline#

Treat every input to Claude Code as untrusted by default — even inputs from "your" systems. A PR comment, an issue body, an MCP server's response, a web page fetched during the task: all of these are inputs a motivated attacker can influence.

Concrete practice:

Do not let Claude Code auto-pull work from public or weakly authenticated sources.
For PR-driven workflows, review tickets before letting the agent act on them.
For MCP responses, be explicit about which upstreams are trusted and which are not.

4. Action-level governance#

Permission prompts at the individual command level are useful but not sufficient. Action-level governance means answering three questions at the call site, not after the fact:

Is this agent allowed to take this action on this resource?
What principal chain is backing this action (which human, through which agent, using which tool)?
Is this action consistent with the task the agent was asked to perform?

This is the governance layer that exists between the agent and the upstream system — enforced in a separate layer, not by the agent itself.

5. Audit you can actually use during an incident#

The question to answer is not "what commands did the agent run." It is: when something goes wrong, can I reconstruct which human's request, through which agent, using which tool, against which system, produced this state — in the first hour of the incident?

Most shops today have the chat transcript and scattered tool-side logs. Those are not the same as a principal-chained audit record.

Where Claude Code is harder than the average agent#

Two properties make Claude Code specifically harder to govern than a SaaS-hosted agent:

It runs locally. There is no corporate-controlled runtime. Controls that rely on a central gateway need to be opt-in on each developer's machine.
It already has the developer's keys. Credentials are not issued per-task — they are on disk, scoped to the developer, and reused across every operation the agent performs.

Both properties push the control plane outward — away from the agent itself, toward the systems the agent acts on.

Common questions#

Does Claude Code's permission prompt handle this? It handles a category of risk — unintended destructive shell commands. It does not handle prompt injection, data exfiltration via allowed commands, MCP server compromise, or supply chain risk. Permission prompts are a layer, not a solution.

Is this unique to Claude Code? No. Every coding agent — Cursor, Aider, Codex CLI, and others — has the same structural properties: local runtime, developer-level credentials, untrusted input surface. The incidents we cited here involved different agents. The control patterns are the same.

Can I just block it? Some enterprises start here, and it is reasonable for very high-sensitivity environments. But the productivity delta is large enough that most security teams will end up needing to govern it rather than block it.

How does this relate to AI agent governance broadly? Claude Code is an instance of the AI agent governance problem, applied to coding agents specifically. The five dimensions (visibility, identity, access, audit, runtime enforcement) all apply here — the specifics just localize to the coding-agent context.

Where FirstOps fits#

FirstOps treats Claude Code as one of the runtimes it governs — with discovery of which agents and MCP servers are running, identity for the credentials they use, access policy enforced at the call site, and audit that reconstructs the principal chain behind every consequential action. If you are working through the five controls above and want to do it as a system rather than a checklist, we would be glad to walk through it.