SOC 2 Type II for AI Systems: Mapping Trust Service Criteria to Agent Decision Logs

The AICPA Trust Services Criteria were last substantively updated in 2017, with supplementary points of focus added in 2022. The criteria were designed to evaluate information systems -- databases, web applications, cloud infrastructure -- where processing logic is deterministic, change-controlled, and directly inspectable. The concept of an autonomous agent that selects its own actions, reasons over unstructured context, and executes decisions without per-instance human approval did not inform the criteria's design.

Yet SOC 2 Type II remains the compliance framework that enterprise customers most frequently require from their SaaS vendors. When those vendors deploy AI systems that make or influence consequential decisions, the Trust Services Criteria must be applied to AI behavior. The criteria do not mention "agents," "models," or "prompts," but the control objectives they define -- authorization, integrity, change management, monitoring -- map directly to the governance challenges that AI systems create.

This article provides that mapping. It takes each of the five Trust Service Categories, identifies the specific criteria most relevant to AI decision systems, and describes the evidence artifacts that engineering teams must produce to demonstrate compliance. The emphasis is on Processing Integrity, which is the category most directly concerned with whether AI outputs are complete, accurate, and authorized -- and the category most frequently excluded from SOC 2 scope by teams that have not yet reckoned with what their AI systems actually do.

Security (Common Criteria): The Foundation Layer

The Common Criteria are mandatory in every SOC 2 engagement. They cover the baseline security controls that protect the system and its data. For AI systems, three Common Criteria areas require AI-specific interpretation.

CC6: Logical and Physical Access Controls

CC6 criteria require that access to system resources is restricted to authorized individuals and systems. For traditional applications, this means user authentication, role-based access control, and infrastructure access management. For AI systems, CC6 extends to a question the criteria's authors did not anticipate: what is the AI agent itself authorized to access and do?

An AI agent operating in production typically holds service credentials that grant it access to databases, APIs, email systems, payment processors, and other operational resources. The CC6 question for auditors is whether those credentials are scoped to the minimum necessary permissions and whether there is a documented, enforceable boundary defining what the agent can and cannot do with those credentials. The answer "the agent only does what its code tells it to do" is not a control. A control is an explicit authority policy that restricts the agent's operational scope and produces evidence when the agent operates within -- or attempts to operate outside -- that scope.

Evidence required: Documented agent authority policies. Access control configurations showing credential scoping. Logs demonstrating that authority boundaries are enforced at runtime, including records of any denied or escalated actions.

CC7: System Operations and Monitoring

CC7 requires that system operations are monitored and anomalies are detected and responded to. For AI systems, operational monitoring must extend beyond infrastructure health to decision behavior. The auditor's question is not "is the system running?" but "is the system deciding correctly?"

Decision behavior monitoring means tracking the distribution of decision outcomes over time and detecting meaningful shifts. If an AI agent that historically approves 85% of expense reports begins approving 98% -- or 40% -- something has changed in the model, the data, or the policy configuration. Without decision behavior monitoring, that change is invisible until its consequences accumulate.

Evidence required: Decision outcome distribution dashboards or reports. Alert configurations for anomalous decision patterns. Incident records showing response to detected anomalies during the observation period.

CC8: Change Management

CC8 is where the most AI-specific audit findings originate. Traditional change management covers code deployments: pull requests, code reviews, CI/CD pipelines, deployment approvals. AI systems introduce at least three additional change vectors that alter decision behavior without any code change: model updates (retraining, fine-tuning, version swaps), policy rule modifications (threshold changes, rule additions, constraint updates), and prompt or instruction changes (system prompt edits, few-shot example modifications).

Each of these change vectors can alter the agent's decision behavior as significantly as a code deployment. If any of them can be modified without documented approval and testing, CC8 has a gap. NIST SP 800-53 Rev 5 control family CM (Configuration Management) provides detailed guidance on managing configuration changes to automated systems, and the principles apply directly to AI decision system components that exist outside the traditional code deployment pipeline.

Evidence required: Change records for every modification to model versions, policy rules, and prompt configurations during the observation period. Each record must include the change description, justification, reviewer, approval, and deployment timestamp.

Processing Integrity: The Category Auditors Now Insist On

Processing Integrity (PI) is the Trust Service Category that most directly addresses whether a system's outputs are what they should be. The AICPA defines Processing Integrity through criteria that require system processing to be complete, valid, accurate, timely, and authorized. For traditional transaction-processing systems, these attributes are relatively straightforward to demonstrate. For AI decision systems, each attribute raises questions that the criteria's original language did not contemplate.

PI Attribute	TSC Language	AI System Translation	Evidence Artifact
Complete	Processing is complete	Every decision-triggering event produces a decision record; no silent decisions exist in the system	Completeness metrics comparing action counts to decision trace counts
Valid	Inputs are validated	Decision inputs are typed, schema-validated, and within expected value ranges before the decision layer evaluates them	Input validation schemas; rejection logs for malformed or out-of-range inputs
Accurate	Processing produces correct results	The decision outcome follows deterministically from the governing rules applied to the validated inputs	Decision traces showing rule-to-outcome mapping; replay test results confirming deterministic evaluation
Timely	Processing occurs within expected timeframes	Decisions are evaluated and committed within the temporal window where they are operationally valid	Decision latency metrics; timeout handling documentation
Authorized	Processing is authorized	The agent has explicit authorization for the action it takes; the action falls within defined authority boundaries	Authority policy documents; enforcement logs showing boundary checks at decision time

The "Accurate" and "Authorized" attributes are where most AI systems encounter audit difficulty. Accuracy for a traditional system means "the calculation is correct." Accuracy for an AI decision system means "the decision outcome is the one that the governing rules specify for these inputs." This requires that governing rules exist, that they are applied to the decision, and that the application can be demonstrated after the fact. Without decision traces that record which rules were evaluated and what outcome they produced, the accuracy of any individual AI decision is unverifiable.

Authorization is similarly challenging. For a traditional system, authorization means "the user had permission to perform this action." For an AI agent, authorization means "the agent had permission to take this action, and that permission was verified before the action was executed." This requires an authority model that defines the agent's operational scope and a runtime enforcement mechanism that checks every action against that model.

The Three Questions Auditors Now Ask That TSC Did Not Anticipate

Beyond the formal criteria, SOC 2 auditors examining AI systems have converged on three questions that emerge from the gap between TSC language and AI system behavior:

"Are decisions logged?" Not "are API calls logged" or "are model outputs logged." Are the decisions -- the committed outcomes produced by rule evaluation -- logged in a structured, immutable format that can be retrieved and inspected? The distinction matters because many teams log extensively at the infrastructure and model layers while capturing nothing at the decision layer. MIT Technology Review's coverage of enterprise AI governance consistently identifies this logging gap as the most common source of audit findings in AI-enabled systems.

"Are policy changes version-controlled?" Can the team show what policy rules were active on any specific date during the observation period? If a rule was changed on April 15th, can the team produce the rule as it existed on April 14th and demonstrate that a decision made on April 14th was evaluated against the April 14th version? Without policy versioning, the auditor cannot verify that the controls described in the SOC 2 report were actually the controls operating at any sampled decision point.

"Can the system demonstrate an AI action was within authorized scope?" For a sampled AI action -- an email sent, an account modified, a transaction processed -- can the team demonstrate that the action was within the agent's defined authority, that the authority boundary was checked before the action was executed, and that the check is recorded? This is the question that separates systems with documented authority policies from systems with enforced authority policies. The auditor is not satisfied by a policy document. They want evidence that the policy was enforced for the specific sampled action.

Availability: When Decision Latency Becomes a Control

Availability criteria (A1) require that the system meets its availability commitments. For AI decision systems, availability is not only about uptime. It is about decision availability: the system's ability to evaluate and commit decisions within the timeframe where those decisions are operationally valid.

An AI agent that processes fraud detection decisions must evaluate within milliseconds. A pricing engine must respond within the customer-facing page load budget. A risk assessment system must complete before the downstream workflow times out. If the decision layer is available but too slow to produce a timely decision, the system is functionally unavailable for its purpose.

Evidence required: Decision latency metrics (P50, P95, P99) over the observation period. Documentation of timeout handling behavior: what happens when the decision layer cannot respond in time? Does the system fail open (take action without a decision), fail closed (block the action), or escalate? The answer to this question is itself a control that the auditor evaluates.

Confidentiality and Privacy: AI-Specific Data Flows

Confidentiality criteria (C1) require that confidential information is protected throughout its lifecycle. Privacy criteria (P1) require that personal information is collected, used, retained, and disposed of in conformity with the entity's privacy commitments. AI systems create data flows that traditional confidentiality and privacy controls may not cover.

Decision traces, by their nature, contain data about decisions made regarding specific entities -- customers, accounts, transactions. If those traces include personal data (names, account numbers, behavioral attributes), the traces themselves become a personal data store subject to privacy controls. If the AI system's context window includes confidential business data retrieved from internal systems, that data transits through the model inference layer (which may be hosted by a third-party provider), creating a confidentiality consideration that traditional data flow diagrams may not capture.

Vanta's guidance on SOC 2 for AI systems recommends that teams map their AI data flows separately from their traditional application data flows, specifically because AI systems introduce data transit patterns (context window assembly, retrieval augmentation, model inference) that do not exist in traditional architectures and may not be covered by existing data flow documentation.

Evidence required: AI-specific data flow diagrams showing how data moves through the context pipeline, model inference, and decision trace storage. Encryption documentation for data at rest and in transit through AI-specific components. Retention policies for decision traces that account for both compliance retention requirements and privacy minimization principles. Access control evidence for decision trace stores showing who can read, export, and query historical decision records.

The Complete TSC-to-AI-Control Mapping

The following table provides the consolidated mapping of Trust Service Criteria to AI system controls and the evidence artifacts auditors expect. This is not exhaustive of all TSC criteria, but covers the criteria most commonly examined and most commonly found to have gaps in AI system engagements.

TSC Category	Key Criteria	AI System Control	Evidence Artifact
Security (CC)	CC6 - Access Controls	Agent authority boundaries; credential scoping; permission enforcement	Authority policy docs; enforcement logs; denied-action records
Security (CC)	CC7 - Monitoring	Decision behavior monitoring; outcome distribution tracking; anomaly detection	Monitoring dashboards; alert configs; incident response records
Security (CC)	CC8 - Change Mgmt	Version control for models, rules, prompts; approval workflows for all change vectors	Change records with approval chains for every AI component modification
Processing Integrity	PI1 - Completeness	Decision trace generation for every committed decision; no silent decisions	Completeness metrics; action-to-trace count reconciliation
Processing Integrity	PI1 - Accuracy	Deterministic rule evaluation; decision traces with rule-to-outcome mapping	Decision traces for sampled decisions; replay test results
Processing Integrity	PI1 - Authorization	Runtime authority boundary enforcement; pre-action permission checks	Authority check logs; boundary enforcement evidence for sampled actions
Availability	A1 - Availability	Decision latency SLAs; timeout handling; failure mode documentation	Latency metrics (P50/P95/P99); timeout behavior documentation
Confidentiality	C1 - Protection	AI data flow mapping; context pipeline encryption; model inference data handling	AI-specific data flow diagrams; encryption evidence; third-party DPAs
Privacy	P1 - Collection/Use	Decision trace PII governance; retrieval pipeline data minimization	Trace retention policies; PII handling documentation; access control evidence

Practical Guidance for Engineering Teams

Engineering teams preparing for a SOC 2 Type II audit that includes AI systems should address four priorities, in order.

First, include Processing Integrity in scope. If your AI system makes or materially influences decisions that affect customers, data, or financial outcomes, PI should be in your SOC 2 scope. Excluding it leaves the most audit-relevant category of AI behavior uncovered and invites the question from enterprise customers: "Why is Processing Integrity not in your SOC 2 report?"

Second, implement decision traces before the observation period begins. Decision traces are the single most important evidence artifact for AI system audits. They satisfy PI completeness, PI accuracy, and CC7 monitoring requirements simultaneously. A team that enters the observation period without decision trace infrastructure will spend the entire period accumulating a gap that cannot be retroactively filled. Decision traces cannot be backfilled. They must be generated contemporaneously with the decisions they document.

Third, bring all AI change vectors under change management. Audit your system for every component that can alter decision behavior: model versions, policy rules, prompt templates, scoring thresholds, feature pipeline configurations. Each component should have a documented change process with approval, testing, and deployment records. The change process does not need to be identical to your code deployment process, but it needs to produce equivalent documentation.

Fourth, define and enforce agent authority boundaries. Document what your AI agent is authorized to do. Implement runtime enforcement that checks every action against that authority definition. Log every enforcement check, including actions that were permitted and actions that were denied or escalated. The authority boundary document and the enforcement logs together satisfy CC6 for AI agent access and PI1 for authorization.

For the architectural patterns that produce these evidence artifacts by design, see The Agent Governance Stack. For the specific decision trace schema that satisfies auditor evidence requirements, see Decision Traces: The Audit Log Pattern That Makes AI Systems Defensible. For the authority model that defines agent operational scope, see The Authorization Model for AI Agents.

The Underlying Point: TSC Criteria Are Sufficient, If Applied Rigorously

The Trust Services Criteria do not need to be rewritten for AI systems. The control objectives they define -- access control, change management, monitoring, processing integrity, confidentiality, privacy -- are the right objectives. What changes is the interpretation of those objectives when applied to systems that reason, decide, and act with a degree of autonomy that the criteria's authors did not anticipate.

The mapping provided in this article is not speculative. It reflects the questions auditors are already asking, the evidence they are already requesting, and the gaps they are already finding. Engineering teams that apply the Trust Services Criteria rigorously to their AI systems -- rather than hoping that traditional application controls will cover AI behavior by extension -- will navigate the audit cleanly. Teams that treat AI components as outside the scope of their existing controls will discover, during the audit, that the most consequential part of their system is the least governed.