ZopNight + Claude via MCP: What Policy-Aware AI Governance Actually Looks Like

A plain AI cloud assistant tells you the S3 bucket is public. ZopNight + Claude via MCP tells you the bucket is public AND violates policy 47, which requires EU-only buckets for any object tagged PII, AND was created by the data team after the policy went live, AND the same team has 12 other buckets that comply.

The difference is not capability. It is context.

Plain AI cloud assistants see cloud state. They call aws:describe_buckets, get a list, and produce a natural-language answer about that list. The answer is correct. It is also, in most governance situations, useless. The question a platform team or a security engineer actually asks is not “what is the state of this resource.” It is “what should the state be, given who owns it, what policy applies, and what we have promised our auditors.”

ZopNight is a governance system that holds the answers to those second questions in a structured graph. When you connect it to Claude via the Model Context Protocol, the AI no longer has to guess at policy intent or ask a human “is this allowed?” It looks up the answer. The piece that closes the governance loop is policy state in the AI context window, exposed through MCP exactly the way read-only cloud MCP servers expose cloud state.

This post is what changes when the AI sees both.

A read-only AWS MCP server can answer “is bucket X public.” A plain K8s MCP server can answer “is pod Y running.” A plain Terraform MCP server can answer “what does the plan change.” All three are useful. None of them know whether the answer matters.

Question	Cloud-only AI answer	Policy-aware AI answer
Is bucket X public?	Yes, ACL grants `AllUsers:READ`.	Yes, and it violates policy 47 (data residency, EU-only for PII tags). The team that owns it: data-platform-team. They have an open exception that expires in 6 days.
Did this Terraform plan add risk?	Yes, it adds an EC2 instance with `0.0.0.0/0` ingress on port 22.	Yes, and that violates the network policy for production accounts. The proposing team has done this twice in 90 days. The right reviewer is the platform on-call for that account.
Why is RDS spend up?	Multi-AZ was enabled on five new instances last week.	Multi-AZ was enabled because policy 23 requires it for production tier and these instances were promoted to production. The cost is expected. The budget owner has been notified.

The cloud-only column is descriptive. The policy-aware column is interpretive. Governance work is interpretive. The question is never “what does the cloud look like.” It is always “does this match what we said it should look like, and if not, why, and what do we do.”

The blind spot is not a model limitation. It is a context limitation. The AI cannot reason about policy that is not in its prompt. The fix is to make policy callable, not implicit. That is what an MCP server does for cloud state. That is what ZopNight’s MCP server does for policy state.

What ZopNight’s MCP server exposes

ZopNight’s central data structure is the policy graph. Every active policy. Every cloud resource that policy applies to. Every team that owns that resource. Every drift event. Every exception, with expiry. Every audit-relevant event.

The MCP server exposes this graph as a small surface of typed tools.

Architecture diagram

Six tools cover most governance Q&A.

zopnight:list_policies returns the active policy set, scoped to an account, environment, or team. The agent can ask “what policies apply to production-finance” and get the list.

zopnight:check_resource takes a resource identifier (an ARN, a K8s UID, a Terraform address) and returns the policy compliance state for that resource. Pass-fail per policy, with the rule that triggered the failure.

zopnight:resource_ownership returns the team, individual, and contact channel that owns a resource. The graph keeps this fresh from tag conventions, OIDC group memberships, and explicit ownership records.

zopnight:drift_events returns drift events scoped to a time range, a resource, or a policy. This is what makes “did this just start happening” answerable.

zopnight:exception_status returns the exception ledger entries for a resource or policy. Exceptions have expiry dates. The agent can answer “is this exception still valid” without a human checking a spreadsheet.

zopnight:violation_history returns the violation history for a resource, team, or policy. Used for “is this a one-off or a pattern.”

The graph itself updates on a feed from OPA bundles, tag schemas in your IaC repo, AWS Config rules, IAM boundary changes, and ZopNight’s own policy editor. The MCP server caches reads for 60 seconds to keep the AI’s tool latency under a second; policies change slowly enough that this is safe.

The policy-aware Q&A loop

What this looks like in production is a two-server tool call sequence.

Architecture diagram

Three scenarios where this loop pays back the integration cost.

Compliance Q&A in change review. The pull request adds a new RDS instance. The reviewer asks Claude “does this comply with our prod data policies?” Claude calls the Terraform MCP to summarize the plan. Calls the ZopNight MCP to check whether the planned configuration violates any active policy. The answer comes back in under 10 seconds. Today this question takes a 20 minute cross-reference between the change description, AWS Config, and the team’s policy doc. It rarely happens at all.

Incident triage during an outage. The on-call gets paged. They ask Claude “what’s broken in production-payments and which policy does it touch.” Claude calls K8s MCP for failing pods, ZopNight MCP for the policies that apply. The answer separates the noise (policy violations that are advisory) from the signal (policy violations that are also production incidents). The triage time drops from 20 to 30 minutes per incident to 2 to 5 minutes because the on-call is no longer cross-referencing dashboards.

Audit prep. A SOC 2 auditor asks “show me every exception against your encryption-at-rest policy in the last year, with rationale and expiry.” Without the policy graph, this is two days of work for a security engineer pulling history from various tools. With ZopNight + Claude, the auditor’s question gets converted into zopnight:violation_history and zopnight:exception_status calls, and the answer is a structured table in seconds. The audit prep work shifts from “reconstructing history” to “reviewing the AI-assembled report.”

Scenario	Cloud-only output	Policy-aware output	Time saved per question
Change review	Plan summary	Plan summary plus per-policy compliance	15 to 20 min
Incident triage	Failing resources	Failing resources scoped to policy violations + ownership	18 to 28 min
Audit prep	Resource snapshots	Exception ledger plus violation history plus rationale	1 to 2 days

The time savings compound because the questions get asked more often. When the answer is low-friction, the question is asked. That is the second-order effect of policy-aware AI.

What changes in incident triage, change review, and audit

Three workflows flip when the AI has policy-graph context.

Workflow	Before policy-aware AI	After
Incident triage	On-call manually cross-references resource state, dashboards, runbooks, policies. Triage is sequential and slow.	On-call asks one question. AI returns failing resources, applicable policies, owners, and exception status. Triage is parallel and fast.
Change review	Reviewer reads the diff, guesses at which policies apply, asks the proposing team to confirm. Most reviews skip policy check.	Reviewer asks AI to evaluate the diff against active policies. AI returns per-policy pass-fail with rationale. Policy check happens on every change.
Audit prep	Security engineer compiles evidence from multiple tools, reconstructs timelines, writes the response. Takes days per audit cycle.	Auditor question converts to MCP calls. AI returns structured answer with citations. Engineer reviews and submits. Hours instead of days.

The coined term for what makes this work is policy-graph context. It is the AI’s effective context window when policy state is exposed alongside cloud state. Without policy-graph context, the AI is a smart dashboard. With it, the AI is a junior governance engineer that does not sleep, does not need onboarding, and does not lose institutional memory when the senior engineer leaves.

This works because in cloud governance: the policies are well-defined and stored as code. The cloud state is well-defined and queryable through APIs. The bridge is what was missing. MCP is the bridge protocol. ZopNight is the policy authority on one side and read-only cloud MCP servers are the state authority on the other. Claude is the reasoning layer that makes the bridge useful in conversation.

Composing read-only cloud MCP with policy-aware governance MCP

The architecture is two MCP servers, not one. This matters.

The cloud MCP server’s job is to be authoritative on state. Live data, low cache, narrow IAM scope, big surface (every read API the cloud supports). The policy MCP server’s job is to be authoritative on intent. Slowly-changing data, longer cache, narrow surface (policies, ownership, exceptions, history). The two have different freshness contracts and different security boundaries.

Composing them in the AI session is what gives the loop its power. The cloud MCP cannot lie about state because it reads from the API. The policy MCP cannot lie about intent because it reads from the governance graph. The AI synthesizes. The audit log captures every call from both servers, which means every AI answer is reproducible from the logs.

A single giant MCP server combining both jobs would have to balance the two freshness contracts in one process and would conflate two security boundaries. Two servers, one AI session, separate audit logs. That is the shape that scales. It also matches the existing closed-loop cloud remediation pattern where detection (state), decision (intent), action, and verification each live in their own component.

Shipping ZopNight + Claude this quarter

The implementation is small enough to run as a single quarter project for a platform team that already runs a governance system.

Phase	Week	Deliverable	Owner
1. Stand up MCP servers	1 to 2	Read-only AWS MCP and ZopNight policy MCP both reachable, audit logs writing to S3	Platform
2. Connect Claude	2 to 3	Slack bot or CLI calling both MCP servers, OAuth from existing IdP	Platform + Security
3. First questions	3 to 4	Three queries shipped: “is this PR compliant,” “which prod resources are out of policy,” “what’s the exception ledger for policy X”	Platform + champion teams
4. Audit story	5 to 6	SOC 2 evidence pipeline using policy MCP audit log and Claude tool-call traces	Security

The first three questions are what build internal demand. Once the change-review use case lands, every reviewer wants the bot. Once the incident-triage use case lands, every on-call wants the bot. The audit story closes the loop with the security and compliance teams, who would otherwise be the last skeptics.

The argument for shipping policy-aware AI now is the same as the argument for shipping read-only AI cloud tools: the production-shaped work is interpretive, the trust ceiling is bounded by read access, and the value is in answering questions that nobody had time to ask before. Adding the policy graph to the AI context is what changes “smart cloud Q&A” into “junior governance engineer.” The substrate is MCP. The composition is two servers, one session. The result is governance work that gets done because asking is now low-friction.