Reference·7 min read

MCP Security

Q: How does this relate to AI agent governance?

MCP security is a specific instance of the broader [AI agent governance](/ai-agent-governance) problem — the MCP boundary is one of the places where the governance layer lives. The five dimensions of agent governance (visibility, identity, access, audit, runtime enforcement) map directly onto the MCP surface.

A reference guide to the security model of the Model Context Protocol: what MCP servers can do, where they fail, and the controls to put in place before you expose production systems through them.

Anshal Dwivedi

April 25, 2026

MCP Security#

A working reference on the security model of the Model Context Protocol — what MCP servers actually are, why they are a governance surface most teams have not accounted for, and the controls that matter before they touch production systems.

The short version#

The Model Context Protocol (MCP) is a protocol for exposing tools, resources, and actions to AI agents. An MCP server is a small service — often running locally on a developer's machine, sometimes hosted — that implements a standard interface for the agent to call. Each MCP server brings its own credentials, its own network access, and its own set of capabilities.

MCP made the agent ecosystem composable. It also created a new class of security surface:

Every MCP server is an independent capability grant, with its own credentials and its own reach.
MCP server responses are inputs to the agent — attacker-influenced responses can reshape the agent's behavior.
The agent runtime (Claude Code, Cursor, etc.) is not in a position to enforce controls between itself and an MCP server — that enforcement, if it exists at all, lives elsewhere.

A security program for MCP is a program that treats each MCP server as a first-class system, not as configuration.

What an MCP server actually is#

Architecturally, an MCP server is closer to a privileged service account than to a browser extension. A typical MCP server:

Holds credentials for one or more upstream services (a GitHub token, a Linear API key, a cloud account, a database connection string).
Runs as a local or remote process that the agent connects to over stdio or HTTP.
Exposes tools (actions the agent can invoke), resources (data the agent can read), and prompts (reusable templates the agent can use).
Is often installed with minimal review — frequently from a package manager, sometimes from a personal GitHub repo.

The important property: the MCP server acts on behalf of the agent, but with its own credentials. When the agent calls "list pull requests," the MCP server calls GitHub using a token that belongs to the MCP server, not to the agent or the user. The upstream system sees the MCP server's identity, not the developer's or the agent's.

That indirection is the source of a governance gap. The identity the upstream sees is not the identity that initiated the action.

The attack surface that matters#

Three shapes of attack are specific to MCP.

1. Compromised MCP servers#

An MCP server installed on a developer's machine becomes a tool the agent will call. If that server is compromised — by a malicious update, a malicious maintainer, or a supply chain attack on its dependencies — the agent becomes an oracle for the attacker.

The LiteLLM supply chain compromise in March 2026 is an instance of this shape, applied to a dependency of agents and MCP servers. A malicious update silently harvested credentials from every machine running it. The agent did not misbehave. The infrastructure under it did.

For MCP specifically, the attack surface includes:

The MCP server's direct dependencies (npm, pip, crates).
The MCP server's runtime environment (often a developer's machine, with all the credentials it holds).
The MCP server's upstream service credentials.

2. Instruction injection through MCP responses#

An MCP server's response is text that enters the agent's context window. An attacker who can influence the upstream system can influence that text — and through it, the agent's next action.

Examples of upstream systems an attacker can often influence:

Public issue trackers (the agent reads issues as part of its work).
Public package registries (the agent reads package metadata).
Web pages (the agent's web-fetch tool returns attacker-controlled HTML).
Public repositories (the agent reads READMEs, comments, and inline instructions).

Each of these is a realistic attacker-controlled channel. The agent has no way to distinguish "instructions the user gave" from "text returned by an MCP server that contains instruction-shaped content."

3. Credential and data spillage through the MCP tool#

Even a well-behaved MCP server can be used by a misbehaving agent to exfiltrate data. If the MCP server has an action like "send a message" or "create an issue" or "upload a file," it is an output channel. The agent, acting on a malicious instruction, can use that channel to move data from where it should stay to where the attacker can read it.

The developer reviewing the agent's behavior may not see this — the tool call looks normal, the action is one the MCP server is designed to do, and the data is embedded in parameters that do not look out of place.

The five controls#

1. Inventory every MCP server#

You cannot govern MCP servers you do not know about. Build a real inventory across your developer population:

What MCP servers are configured on each developer's machine?
What credentials does each one hold?
What upstream services can each one reach?

In practice, this inventory does not exist in most organizations today. Building it is the first piece of governance work.

2. Minimize each MCP server's credentials#

The default credential issued to an MCP server is often "whatever the developer had handy" — a personal access token with broad scope, an admin-level API key, a long-lived cloud credential. For each MCP server:

Can the credential be scoped tighter?
Can it be short-lived and re-issued on demand?
Can it be moved to a central broker rather than a config file on disk?

Aggregate across a developer's six configured MCP servers, and the effective privilege is often higher than any individual system in your environment. That aggregate is what matters.

3. Treat MCP responses as untrusted input#

Adopt the mental model that any text returned by an MCP server is attacker-influenced until proven otherwise. This is not paranoia — it is the only posture consistent with how the underlying upstreams actually work. Practical expression:

Do not let agents act on MCP-returned instructions without an intermediate step.
For high-consequence actions (pushes to production, data exports, credential changes), require a human confirmation that is not itself influenceable by the MCP response.

4. Action governance at the MCP boundary#

The agent is not a trustworthy enforcement point — its behavior is driven by inputs it cannot fully validate. Enforcement has to live at the boundary between the agent and the MCP server, or between the MCP server and the upstream. Questions to answer at that boundary:

Is this agent, acting on behalf of this human, allowed to invoke this tool?
Are the arguments to this tool within the scope of the task the agent was asked to perform?
Is the principal chain verifiable end-to-end?

Most MCP deployments today have none of this. The agent calls the tool, the tool calls the upstream, the upstream logs the MCP server's identity. No principal chain, no policy, no enforcement.

5. Audit the full chain#

An MCP tool call that is only logged as "MCP server X called upstream Y" is not audit. The audit record has to reconstruct: which human, through which agent, through which MCP server, using which tool, against which resource, with what result. During an incident, the gap between "the upstream log shows the MCP server did this" and "we know which developer's prompt triggered it" is the gap between a five-minute scoped fix and a three-week whole-environment investigation.

Common questions#

Can I just block MCP servers? In some environments, yes. But the productivity value is large, and the trend is toward more MCP usage, not less. Governance is the more likely answer than prohibition.

Are hosted MCP servers safer than local ones? Differently risky. Hosted MCP servers move the credential handling off developer machines, which is a win. They also expand the attack surface to include the hosting provider and the transport. Each deserves its own review.

Is this just the supply chain problem with extra steps? Partially. MCP inherits every supply chain concern of its dependencies, plus the identity indirection problem, plus the instruction-injection problem. All three have to be treated.

How does this relate to AI agent governance? MCP security is a specific instance of the broader AI agent governance problem — the MCP boundary is one of the places where the governance layer lives. The five dimensions of agent governance (visibility, identity, access, audit, runtime enforcement) map directly onto the MCP surface.

Where FirstOps fits#

FirstOps treats MCP servers as first-class entities in the agent runtime. Discovery enumerates every MCP server configured across a developer population. Identity is issued to MCP servers distinctly from developers, so the upstream sees a verifiable principal chain. Policy is evaluated at the MCP call boundary — agent + human + action + resource — and enforcement sits in a layer the agent has to pass through, not inside the agent runtime. If you are doing the work in this post as a security program, we would be glad to walk through it.