LangGraph has earned its place as the most architecturally serious framework for building stateful, multi-step agent workflows. Its graph-based execution model, checkpointing system, and support for human-in-the-loop patterns make it a defensible production choice. According to the Stanford HAI 2025 AI Index, the gap between organizations deploying AI agents internally and organizations governing those agents is widening, not narrowing. Teams that built with LangGraph are now discovering that "running in production" and "running in governed production" are different engineering problems.
This article is not about replacing LangGraph or criticizing its design. LangGraph solves the orchestration problem well. The problem it does not solve -- because it was never designed to solve it -- is the governance problem: who authorized this action, under what policy, with what evidence, and can the decision be replayed for an auditor? These are not features you add to a framework. They are infrastructure you build alongside it.
What follows is a staged migration guide. It identifies the five specific points in a LangGraph execution where uncontrolled action execution occurs, shows where to inject a deterministic decision gate at each one, and walks through a four-phase migration from assessment to full enforcement. If you are familiar with the propose-then-decide architecture, this article applies that architecture specifically to LangGraph.
What LangGraph Provides Natively vs. What It Delegates
Before identifying governance injection points, it is worth being precise about what LangGraph does and does not give you. Misattributing governance capabilities to the orchestration framework is the root cause of most governance gaps.
| Capability | LangGraph Provides | Delegated to Infrastructure |
|---|---|---|
| State management | Graph state, checkpointing, replay from checkpoint | Policy-aware state transitions |
| Tool execution | Tool node routing, argument passing | Pre-execution authorization, parameter constraints |
| Human-in-the-loop | Interrupt mechanism, approval resume | Policy-driven escalation criteria, approval audit |
| Subgraph composition | Nested graph invocation, state bridging | Cross-boundary authority delegation, trace propagation |
| Output handling | Final state reduction, output formatting | Output validation against policy, PII redaction |
| Logging | Callback-based event emission | Audit-grade decision traces with rule references |
The left column is what LangGraph handles. The right column is what your governance infrastructure must handle. The migration guide below is about building the right column and connecting it to the left column at the correct boundaries. This distinction maps directly to the decision plane vs. orchestration layer separation that prevents governance from becoming entangled with execution logic.
The Five Uncontrolled Execution Points
A default LangGraph deployment has five specific locations where consequential actions execute without governance evaluation. These are not design flaws; they are integration boundaries where the framework intentionally defers to external systems. The problem arises when no external system is present to fill the gap.
1. Tool Nodes
Tool nodes are the primary execution boundary. When the LLM proposes a function call, the graph routes to the tool node, which executes the function directly. In the ReAct pattern that underlies most LangGraph agents, this is where reasoning becomes action. Without a governance gate, any tool call the model proposes will execute -- including tool calls that result from prompt injection, hallucinated parameters, or misinterpreted context.
The governance requirement: every tool call must be evaluated against an authorization policy before execution. The policy must specify which agent identities can invoke which tools, under what parameter constraints, and with what rate limits.
2. State Transitions
LangGraph models agent behavior as a graph of state transitions. Each transition represents a decision about what to do next: which node to route to, whether to continue or terminate, whether to loop back for additional reasoning. By default, these transitions are governed by the graph definition itself -- conditional edges, routing functions, and model-driven branching. None of these mechanisms evaluate whether the transition is permitted by an external policy.
The governance requirement: high-consequence state transitions -- such as moving from "draft" to "send," from "quote" to "commit," or from "plan" to "execute" -- must pass through a policy checkpoint. Not every transition needs governance; only irreversible or high-impact ones.
3. Interrupt Handlers
LangGraph's interrupt mechanism allows human-in-the-loop approval. When an interrupt fires, execution pauses until a human provides input. The governance gap is in the resume path: when execution resumes after an interrupt, the approval context (who approved, when, under what authority, with what scope) is typically not captured in a structured, auditable format. The approval happened, but the evidence of approval is not part of the decision trace.
The governance requirement: every interrupt-resume cycle must generate a trace record that includes the approver identity, the approval scope, the timestamp, and the specific action authorized. This trace must be linked to the downstream execution it enabled.
4. Subgraph Boundaries
LangGraph supports composing agents from subgraphs -- nested graph invocations that encapsulate specialized behavior. When a parent graph invokes a subgraph, it delegates execution authority. In most deployments, this delegation is implicit: the subgraph inherits whatever authority the parent has, with no explicit check on whether the delegation is permitted or scoped.
The governance requirement: authority delegation across subgraph boundaries must be explicit. A parent graph invoking a "payment processing" subgraph should only do so if the parent's authority scope includes payment operations. The delegation should be traceable -- the subgraph's decision traces must reference the parent's delegation token. See multi-agent governance delegation for the full pattern.
5. Output Validators
The final execution point is the output boundary -- where the agent's result is returned to the caller or written to a downstream system. LangGraph provides output state reduction but does not validate outputs against governance policies. If the agent's output contains PII that should have been redacted, or includes a recommendation that exceeds the agent's authorized scope, the output passes through unchecked.
The governance requirement: outputs must be evaluated against content policies, PII rules, and scope constraints before delivery. This is not output formatting; it is output authorization.
Where to Insert a Deterministic Decision Gate
A deterministic decision gate is a synchronous evaluation that sits between a proposed action and its execution. It receives a structured request (what action, by what identity, in what context), evaluates it against a versioned policy, and returns a verdict: permit, deny, or modify. Every evaluation generates a decision trace regardless of the outcome.
For each of the five execution points, the injection pattern is the same: intercept before execution, evaluate, record, then proceed or block. The difference is the mechanism used to intercept.
# Decision gate injection pattern for LangGraph tool nodes
from typing import Any, Dict
from langgraph.prebuilt import ToolNode
class GovernedToolNode:
"""Wraps a LangGraph ToolNode with a deterministic decision gate."""
def __init__(self, tools, policy_engine, trace_store):
self.tool_node = ToolNode(tools)
self.policy_engine = policy_engine
self.trace_store = trace_store
def __call__(self, state: Dict[str, Any]) -> Dict[str, Any]:
# Extract the proposed tool call from state
messages = state.get("messages", [])
last_message = messages[-1]
if not hasattr(last_message, "tool_calls"):
return self.tool_node(state)
for tool_call in last_message.tool_calls:
# Build governance request
decision_request = {
"agent_id": state.get("agent_identity"),
"tool_name": tool_call["name"],
"arguments": tool_call["args"],
"session_id": state.get("session_id"),
"context": {
"graph_node": "tool_execution",
"message_count": len(messages)
}
}
# Evaluate against policy
verdict = self.policy_engine.evaluate(decision_request)
# Record trace (always, regardless of verdict)
self.trace_store.append({
"request": decision_request,
"policy_version": verdict.policy_version,
"rules_evaluated": verdict.rules,
"outcome": verdict.outcome,
"reason": verdict.reason
})
# Enforce
if verdict.outcome == "deny":
return {
"messages": [
ToolDeniedMessage(
tool_call_id=tool_call["id"],
reason=verdict.reason
)
]
}
# All tool calls permitted -- execute normally
return self.tool_node(state)For state transitions, the injection point is the conditional edge function. For interrupt handlers, it is the resume callback. For subgraph boundaries, it is the subgraph invocation wrapper. For output validators, it is the final state reducer. In each case, the pattern is identical: intercept, evaluate, trace, enforce.
The Four-Phase Migration
Migrating a production LangGraph deployment to governed execution is not an overnight switch. It is a staged process designed to minimize disruption while maximizing governance coverage. The NIST AI Risk Management Framework recommends an incremental approach: identify risks, implement controls, validate effectiveness, then expand. The following four phases implement that recommendation for LangGraph specifically.
Phase 1: Assessment (Week 1-2)
Before writing a single governance rule, instrument your existing LangGraph deployment to understand what it actually does. Attach a lightweight trace exporter using LangGraph's callback system to capture every tool call, every state transition, and every interrupt event. Do not enforce anything; just observe and record.
The output of this phase is a catalog of:
- Every tool invoked by the agent, with frequency and argument patterns
- Every state transition, including which transitions lead to irreversible actions
- Every interrupt event, with approval patterns and latency
- Every subgraph invocation, with the authority scope implied by each delegation
- Every output delivered, with content classification (contains PII, financial data, recommendations)
This catalog is the foundation for Phase 2. You cannot write effective governance rules for actions you have not observed. Teams that skip assessment and jump directly to policy writing consistently over-restrict low-risk actions and under-restrict high-risk ones.
Phase 2: Injection Points (Week 3-4)
Using the assessment catalog, identify the three to five highest-risk execution points -- the tool calls, transitions, or delegations where ungoverned execution creates the most exposure. Common examples: tools that modify customer data, transitions from "draft" to "published," subgraph delegations that cross trust boundaries.
For each identified point, implement the governed wrapper pattern shown above. Wire each wrapper to a policy engine (OPA, Cedar, or a purpose-built decision service -- see choosing the right policy engine). Write initial policies that are permissive by default: log the evaluation but permit the action. This lets you validate that the injection point is correctly positioned without risking production disruption.
Phase 3: Shadow-Mode Validation (Week 5-8)
Shadow mode is the critical phase that separates safe migration from risky migration. In shadow mode, the governance gate evaluates every action against the full policy set and records what it would have done -- permit, deny, or modify -- but does not enforce the verdict. The agent continues to execute normally. The governance system generates shadow traces that show what would have changed under enforcement.
# Shadow-mode evaluation: evaluate and record, but do not enforce
class ShadowGovernanceGate:
def evaluate(self, decision_request):
verdict = self.policy_engine.evaluate(decision_request)
# Record what would have happened
self.trace_store.append({
"mode": "shadow",
"request": decision_request,
"verdict": verdict.outcome,
"would_have_blocked": verdict.outcome == "deny",
"reason": verdict.reason,
"policy_version": verdict.policy_version
})
# Always permit in shadow mode
return Verdict(outcome="permit", shadow_verdict=verdict.outcome)Review shadow traces weekly. Look for three signals:
- False denials: Actions the policy would have blocked that are actually legitimate. These indicate policy rules that are too restrictive and need refinement.
- True denials: Actions the policy would have blocked that should indeed be blocked. These validate the policy is working correctly and build confidence in enforcement.
- Uncovered actions: Consequential actions that the governance gate never evaluates because they flow through an unmonitored path. These indicate missing injection points from Phase 2.
Shadow mode should run for a minimum of two weeks for high-frequency workflows and four weeks for workflows with weekly or monthly patterns. The goal is to observe a complete cycle of normal operations before enforcing.
Phase 4: Full Enforcement (Week 9+)
After shadow-mode validation has produced acceptable false-denial rates (target: under 1% for high-frequency tool calls), switch the governance gate from shadow mode to enforcement mode for the highest-risk execution points first. Keep lower-risk points in shadow mode for an additional cycle.
Full enforcement means the governance gate can deny actions. This requires a defined failure mode: when the policy engine is unreachable, does the system fail open (permit with degraded logging) or fail closed (deny until policy is available)? For consequential actions -- financial transactions, data deletions, access grants -- fail closed is almost always correct. For read-only informational queries, fail open with alerting may be acceptable. Define this per tool category.
Enforcement is not a finish line. It is the start of ongoing governance operations: rule updates, version management, trace review, and periodic reachability analysis to ensure rules do not become dead code.
A Realistic Migration Timeline
| Phase | Duration | Deliverable | Risk if Skipped |
|---|---|---|---|
| Assessment | 1-2 weeks | Action catalog, risk classification | Writing rules for actions you have not observed |
| Injection Points | 1-2 weeks | Governed wrappers at 3-5 high-risk points | Governance covers low-risk actions but misses high-risk ones |
| Shadow Validation | 2-4 weeks | Shadow traces, false-denial analysis, policy tuning | Enforcement blocks legitimate actions in production |
| Full Enforcement | Ongoing | Synchronous enforcement for high-risk actions | Governance is observation-only; no prevention capability |
Total time from zero governance to initial enforcement: approximately nine weeks. Teams that attempt this in one week -- by deploying enforcement rules without assessment or shadow validation -- consistently experience production disruptions that erode trust in the governance system itself.
What Changes and What Stays the Same
A governed LangGraph deployment does not look fundamentally different from an ungoverned one. The graph definition stays the same. The tool implementations stay the same. The model configuration stays the same. What changes is the boundary layer between "the model proposes" and "the system executes."
Specifically:
- Tool nodes are wrapped with governed executors that evaluate before executing. The wrapping is transparent to the graph definition -- the node signature does not change.
- State transitions gain policy checkpoints at high-risk edges. The graph structure does not change; the conditional edge functions gain governance awareness.
- Interrupt handlers generate structured approval traces. The interrupt mechanism does not change; the resume path gains evidence generation.
- Subgraph invocations carry delegation tokens. The invocation API does not change; the invocation context gains authority metadata.
- Outputs pass through a validation gate before delivery. The output format does not change; the delivery pipeline gains a policy checkpoint.
The engineering principle is additive governance: every governance capability is layered on top of existing LangGraph behavior without modifying the core graph logic. This means governance can be rolled back independently of graph changes, and graph updates do not require governance revalidation (unless new tools or transitions are added, which require extending the governance rule set).
Common Pitfalls in LangGraph Governance Migration
Teams that have gone through this migration report a consistent set of mistakes. Knowing them in advance saves weeks of rework.
- Governing at the prompt level instead of the action level. Adding "you must not delete customer data without approval" to the system prompt is not governance. The model may follow this instruction. It may not. Prompt-level constraints are non-deterministic, untestable, and invisible to auditors. Governance must operate at the tool call boundary where deterministic enforcement is possible.
- Applying uniform policy to all tool calls. Not every tool call is equally consequential. A tool that reads a customer's public profile does not need the same governance rigor as a tool that initiates a bank transfer. Categorize tools by consequence level and apply governance proportionally. Over-governing low-risk actions creates latency without safety benefit.
- Skipping shadow mode for "obvious" rules. Even rules that seem obviously correct -- "deny any tool call that transfers more than $10,000" -- can produce unexpected false denials in production. Shadow mode reveals edge cases: split transactions, currency conversions, pre-authorized batch operations. Run every rule through shadow validation before enforcement.
- Embedding governance state in the graph state. If your governance verdicts, policy versions, or approval records are stored in the LangGraph state object, you have coupled governance to orchestration. Governance state belongs in the decision trace store, not in the graph. The graph state should remain a representation of the agent's working memory, not its compliance record.
- Treating the migration as a project instead of an operation. Governance is not something you implement and then finish. It is an ongoing operation: rules change, new tools are added, regulatory requirements evolve. The migration establishes the infrastructure. The operation maintains it.
The Governance Infrastructure Stack for LangGraph
A complete governance infrastructure for a LangGraph deployment includes four components, each deployable independently:
- Policy engine: Evaluates structured decision requests against versioned rules. OPA, Cedar, or a purpose-built service. Must support sub-10ms evaluation latency for synchronous gates. Must support rule versioning for trace replay.
- Trace store: Append-only, immutable storage for decision traces. Must support the minimal trace schema: identity, inputs, rules evaluated, verdict, enforcement record. Must support retention policies aligned with regulatory requirements.
- Governed wrappers: The interception layer that sits between LangGraph execution points and the policy engine. Framework-specific (LangGraph tool node wrappers, conditional edge interceptors), but the governance logic they invoke is framework-agnostic.
- Shadow/enforcement mode controller: A configuration layer that determines, per tool category, whether the governance gate operates in shadow mode (evaluate and record) or enforcement mode (evaluate, record, and enforce). Must support per-category control, not just global toggle.
Teams building this stack from scratch should expect four to six weeks of engineering effort for the initial implementation, plus ongoing operational maintenance. Teams using purpose-built governance integration infrastructure can reduce the initial build to one to two weeks by leveraging pre-built policy evaluation, trace generation, and shadow-mode capabilities.
The Principle: Orchestration Is Not Governance
The central insight of this migration guide is a separation of concerns. LangGraph is an orchestration framework. It manages state, routes execution, sequences tool calls, and handles the mechanics of multi-step agent workflows. It does these things well. Governance -- authorization, policy enforcement, audit evidence, safe rule change -- is a different concern with a different lifecycle, different stakeholders, and different change cadence.
The LangGraph documentation itself does not claim to solve governance. It provides the hooks -- callbacks, custom executors, interrupt mechanisms -- that governance infrastructure can integrate with. The migration guide above uses those hooks to connect LangGraph's execution model to a governance decision layer that operates independently.
For teams that have already implemented the basic governance patterns for LangGraph agents, this guide extends that foundation with a structured migration path and shadow-mode validation. For teams starting from scratch, it provides the architectural map from ungoverned execution to deterministic, auditable agent workflows.
