From EHR Alert to Clinical Decision: A Composite Case Study in Governing Deterministic AI at a Health System

A composite case study tracing the end-to-end governance of a deterministic AI sepsis alert system at a mid-sized academic medical center. Follows every governance decision from model score to nursing alert, including policy rule encoding, decision trace generation, and audit evidence structuring for FDA SaMD review.

From EHR Alert to Clinical Decision: A Composite Case Study in Governing Deterministic AI at a Health System

At 2:47 AM on a Tuesday in March, a sepsis prediction model running inside the electronic health record at Midland Academic Medical Center scored a patient in bed 4-West-12 at 0.81 -- above the model's internal threshold for high-risk sepsis onset. The model had processed the patient's vitals, lab values, medication history, and nursing assessments from the preceding eight hours. It had produced a probability. What happened between that probability and the alert that appeared on the charge nurse's workstation eleven seconds later is the subject of this case study.

This narrative is a composite. It draws on publicly documented patterns from health systems that have deployed AI-assisted clinical decision support, regulatory guidance from the FDA's AI/ML Action Plan, and the NIST AI Risk Management Framework. No single institution is depicted. The governance decisions, architectural choices, and failure modes described are drawn from multiple real-world deployments and public literature. The patient, the institution, and the specific implementation are fictional. The problems and solutions are not.

The purpose is to trace a single clinical AI alert from raw model output to bedside action, showing every governance layer it passes through, why each layer exists, and what evidence each layer produces for regulatory review.

The Patient, the Model, and the Score

The patient -- a 68-year-old male admitted two days prior for elective hip replacement -- presented unremarkably for the first 36 hours of his stay. His vitals were stable. His surgical recovery was on track. At 1:15 AM on the third day, his temperature rose to 38.4 degrees Celsius. At 2:12 AM, his heart rate increased to 102 beats per minute. His white blood cell count, drawn at 1:00 AM as part of a routine post-surgical panel, returned at 14.2 thousand per microliter. Individually, none of these values would trigger a sepsis alert. Collectively, in the context of a post-surgical patient with no prior history of infection, they form a pattern.

Midland's sepsis prediction model -- a gradient-boosted ensemble trained on five years of de-identified EHR data from the health system's patient population -- processed these values along with 47 other features from the patient's record. The model produced a continuous risk score: 0.81 on a 0-to-1 scale. This score represented the model's estimated probability that the patient would meet Sepsis-3 clinical criteria within the next six hours.

At many health systems, this score would go directly to a clinical alert. At Midland, it did not. The score entered a deterministic decision layer -- and the governance decisions embedded in that layer are what make this case study worth tracing.

Why the Model Score Is Not the Alert

The decision to insert a deterministic layer between the model's risk score and the clinical alert was made eighteen months before this patient's admission. It was not an engineering preference. It was a governance requirement driven by three intersecting concerns.

First, the FDA's evolving guidance on AI/ML-based Software as a Medical Device treats the alert -- the information presented to the clinician -- as the regulated output, not the model's internal probability. An alert that reaches a nurse's workstation is a clinical recommendation. A probability score sitting in a database is not. The distinction matters because the FDA's predetermined change control plan framework requires that the logic producing the clinical recommendation be documented, versioned, and auditable. A machine learning model's internal weights are not auditable in this sense. A set of deterministic rules applied to the model's output is.

Second, the clinical leadership at Midland had experienced alert fatigue from a previous sepsis screening tool that fired on raw model scores. The false-positive rate at a threshold of 0.70 was 34 percent. Nurses learned to ignore the alerts. The tool was clinically useless despite having reasonable model performance. The problem was not the model. The problem was that there was no governance layer between model output and clinical action -- no place to encode institutional clinical judgment about when an alert was actionable and when it was noise.

Third, HIPAA's security rule requires audit controls for systems that create, receive, maintain, or transmit electronic protected health information. A system that surfaces patient-specific clinical alerts must produce audit records documenting what information was accessed, what logic was applied, and what action resulted. A model score alone does not create this record. A deterministic decision layer, by design, does.

Inside the Decision Layer: The Rules That Governed This Alert

The deterministic decision layer at Midland evaluated the model's 0.81 score against a set of versioned clinical protocol rules. These rules were not written by engineers. They were authored by a clinical informatics committee composed of two intensivists, a nurse informaticist, an infectious disease physician, and a clinical pharmacist. The rules were then encoded as formal policy statements, version-controlled, and deployed through the same safe rollout process that Midland used for medication dosing protocols.

At the time of this patient's evaluation, the active rule set was version 4.2.1, last updated eleven weeks prior. The rules evaluated in sequence were:

# Midland Sepsis Alert Protocol v4.2.1
# Effective: 2025-12-15
# Approved by: Clinical Informatics Committee (CIC-2025-089)

rule sepsis_alert_threshold:
  condition: model_score >= 0.75
  action: proceed_to_clinical_filters
  rationale: "Threshold set to balance sensitivity (0.82) against
    alert fatigue; validated against 2024 Q3-Q4 retrospective cohort"

rule surgical_patient_modifier:
  condition: patient.surgical_status == "post_operative"
    AND patient.post_op_hours <= 72
  action: require_two_of_three_clinical_criteria
  criteria:
    - temperature > 38.3 OR temperature < 36.0
    - heart_rate > 100
    - wbc > 12.0 OR wbc < 4.0
  rationale: "Post-surgical patients have elevated baseline inflammatory
    markers; single-criterion alerts produce unacceptable false-positive
    rate in this population (validated CIC-2025-067)"

rule comfort_care_suppression:
  condition: patient.code_status == "comfort_measures_only"
  action: suppress_alert
  log_suppression: true
  rationale: "Alerts for patients on comfort care create clinical
    burden without actionable benefit; suppression logged for audit"

rule recent_alert_cooldown:
  condition: last_alert_for_patient < 4_hours_ago
  action: suppress_alert
  log_suppression: true
  rationale: "Prevents repeat alerting for patients already under
    active sepsis evaluation; cooldown period per CIC-2025-071"

For the patient in 4-West-12, the evaluation proceeded as follows. The model score of 0.81 exceeded the 0.75 threshold in rule one. The patient was post-operative (48 hours since surgery), triggering rule two's requirement for two of three clinical criteria. The patient met all three: temperature 38.4, heart rate 102, white blood cell count 14.2. Rule three did not apply -- the patient was full code. Rule four did not apply -- no prior sepsis alert existed for this patient in the preceding four hours.

The decision layer produced a verdict: surface the alert.

The Decision Trace: What Was Recorded

The moment the decision layer reached its verdict, it generated a Decision Trace -- a structured, immutable record of every input evaluated, every rule applied, and the outcome. This trace was not a log entry. It was a first-class audit artifact, stored in an append-only data store separate from the EHR's operational logs.

{
  "trace_id": "dt-2026-03-17-0247-4w12-sepsis",
  "timestamp": "2026-03-17T02:47:11.034Z",
  "patient_encounter_id": "enc-redacted-hash-7f3a",
  "model_version": "sepsis-ensemble-v3.1.2",
  "model_score": 0.81,
  "rule_set_version": "4.2.1",
  "rules_evaluated": [
    {
      "rule": "sepsis_alert_threshold",
      "input": {"model_score": 0.81},
      "result": "passed",
      "action": "proceed_to_clinical_filters"
    },
    {
      "rule": "surgical_patient_modifier",
      "input": {
        "surgical_status": "post_operative",
        "post_op_hours": 48,
        "temperature": 38.4,
        "heart_rate": 102,
        "wbc": 14.2
      },
      "criteria_met": 3,
      "criteria_required": 2,
      "result": "passed",
      "action": "continue"
    },
    {
      "rule": "comfort_care_suppression",
      "input": {"code_status": "full_code"},
      "result": "not_applicable"
    },
    {
      "rule": "recent_alert_cooldown",
      "input": {"last_alert_hours_ago": null},
      "result": "not_applicable"
    }
  ],
  "verdict": "alert_surfaced",
  "alert_recipient": "charge_nurse_4west",
  "alert_delivery_confirmed": "2026-03-17T02:47:22.108Z"
}

Every field in this trace serves a specific governance purpose. The model_version field enables post-hoc analysis of model performance -- if the model is later found to have a systematic bias in a patient subpopulation, every alert it contributed to can be identified and reviewed. The rule_set_version field pins the alert to the exact rules that governed it, so that a subsequent rule change does not retroactively change the interpretation of past alerts. The rules_evaluated array provides a complete chain of reasoning from model score to clinical action, satisfying the explainability requirement that regulators and clinical leadership both demand.

Crucially, the trace also records alerts that were suppressed. When rule three (comfort care suppression) or rule four (cooldown) prevents an alert from reaching a clinician, the Decision Trace still captures the evaluation, the inputs, and the suppression reason. This is not a technicality. It is a patient safety requirement. If a suppressed alert later correlates with an adverse outcome, the trace provides the evidence needed to determine whether the suppression rule was appropriate for that patient's clinical context.

The Nurse's Workstation: What the Alert Looked Like

The charge nurse on 4-West received the alert at 2:47 AM. The alert presented on the EHR's alert panel as a structured notification, not a raw risk score. It read: "Sepsis screening alert for patient 4-West-12. Three of three clinical criteria met (elevated temperature, tachycardia, leukocytosis) in the context of an elevated sepsis risk score (post-operative patient, 48 hours post-surgery). Recommended action: bedside assessment and consideration of Sepsis-3 screening criteria per institutional protocol."

The alert text was generated by the decision layer, not the model. The model produced a number. The decision layer produced a clinical communication. This distinction was deliberate: the clinical informatics committee had determined that presenting raw probability scores to bedside nurses was neither useful nor appropriate. Nurses needed actionable clinical information -- which criteria were met, why this patient's context was relevant, and what the recommended next step was. The decision layer's rule structure contained the templates for generating this communication, and the templates were versioned alongside the rules themselves.

The nurse responded within nine minutes, documenting a bedside assessment that confirmed the clinical findings and initiating the institution's Sepsis-3 screening protocol. Blood cultures were drawn at 3:14 AM. Broad-spectrum antibiotics were administered at 3:42 AM, within the institution's target of 60 minutes from alert to antibiotic administration. The patient's sepsis was confirmed by culture results 36 hours later. He was discharged ten days after admission, five days after his originally planned discharge -- a delayed but complete recovery.

The Suppressed Alert That Tested the System

Three weeks after the event described above, the governance system was tested from the other direction. A 74-year-old patient on the same unit, admitted for palliative management of metastatic cancer, had a code status of comfort measures only. The sepsis model scored this patient at 0.79. The decision layer evaluated the score against the same rule set. Rule one passed (score above 0.75). Rule two did not apply (not a surgical patient). Rule three triggered: the patient's code status was comfort measures only, and the alert was suppressed.

The Decision Trace recorded the suppression with full detail: every input evaluated, every rule applied, and the suppression reason. The trace was flagged for review by the clinical informatics committee as part of their monthly audit of suppressed alerts.

During that review, the committee examined whether the suppression was clinically appropriate. The patient's family had requested comfort measures only. The attending physician confirmed that a sepsis alert would not have changed the care plan. The suppression was validated. But the committee also identified a question the rule did not address: what if a comfort-care patient's family had specifically requested that infection management continue as part of comfort measures? The committee drafted a modification to rule three that would allow individual patient-level overrides of the comfort-care suppression, with the override recorded as a separate auditable decision. That modification entered the safe rollout pipeline as version 4.3.0 and was deployed six weeks later after shadow evaluation against the preceding quarter's data.

This episode illustrates why suppressed alert traces are as important as surfaced alert traces. Without the trace, the committee would not have had the evidence to review the suppression, would not have identified the edge case, and would not have improved the rule. The governance infrastructure was not just enforcing rules. It was generating the evidence that allowed the rules to be safely improved.

Structuring Audit Evidence for FDA SaMD Review

Midland's clinical AI system fell within the FDA's framework for AI/ML-based Software as a Medical Device. The sepsis alert system was classified as a Class II medical device under the De Novo pathway, and the institution had submitted a predetermined change control plan as part of its regulatory filing. That plan specified two categories of changes: changes to the model (retraining, architecture modifications) and changes to the decision rules (threshold adjustments, new clinical filters, suppression rule modifications). Each category had different documentation requirements and approval pathways.

The deterministic decision layer's audit infrastructure was designed to produce the evidence artifacts required by both categories. For model changes, the versioned model identifier and per-alert trace enabled retrospective comparison of model performance across versions. For rule changes, the versioned rule set, the shadow evaluation logs from safe rollout, and the clinical informatics committee's approval records provided the documentation chain the FDA expected.

The specific artifacts Midland maintained for regulatory review were:

ArtifactSourceRetentionPurpose
Decision Traces (all alerts)Decision layer, append-only store10 yearsPer-alert audit evidence; retrospective outcome analysis
Rule set version historyVersion-controlled rule repositoryLife of deviceDemonstrate rule provenance and change control
Shadow evaluation logsSafe rollout pipeline5 yearsEvidence that rule changes were validated before production
Clinical Informatics Committee minutesCommittee recordsLife of deviceHuman oversight; approval provenance for rule changes
Suppressed alert review reportsMonthly clinical audit10 yearsEvidence that suppression rules were clinically reviewed
Model performance validation reportsQuarterly revalidation processLife of deviceOngoing performance monitoring per FDA GMLP expectations

The Stanford HAI 2025 AI Index Report documented a growing pattern across health systems: organizations that built audit infrastructure proactively -- before regulatory examination -- reported significantly faster and less adversarial regulatory interactions than those that attempted to reconstruct audit evidence retroactively. Midland's chief medical informatics officer described the decision trace infrastructure as "the single investment that changed our regulatory posture from defensive to demonstrable."

HIPAA Compliance: How the Decision Layer Addressed Privacy Requirements

The HIPAA Security Rule's audit control standard (45 CFR 164.312(b)) requires covered entities to implement hardware, software, and procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information. The sepsis alert system accessed PHI extensively: vitals, lab results, medication lists, surgical history, and code status.

The Decision Trace served double duty as a HIPAA audit record. It documented which patient data elements were accessed by the system, when they were accessed, what logic was applied to them, and what output was produced. The traces were stored in an access-controlled, append-only store with role-based access limited to the clinical informatics team, the compliance office, and designated auditors. Access to the trace store itself was logged, creating a secondary audit trail of who reviewed what traces and when.

The design also addressed HIPAA's minimum necessary standard. The Decision Trace recorded the fact categories evaluated (temperature, heart rate, WBC count) and their values, but did not replicate the patient's full medical record. The trace contained the minimum data necessary to reconstruct the decision, not a comprehensive PHI extract. This was an explicit design decision by the clinical informatics committee, documented in the system's data governance policy.

What This Case Study Demonstrates

The events traced in this narrative -- a single sepsis alert, a single suppressed alert, and the governance infrastructure surrounding both -- illustrate a set of principles that apply well beyond healthcare.

A model's output is not a decision. At Midland, the model produced a probability. The decision -- whether to surface an alert, what clinical context to include, or whether to suppress notification for a comfort-care patient -- was made by a deterministic layer whose rules were authored by clinicians, versioned, auditable, and subject to safe rollout. The model contributed analytical capability. The decision layer contributed institutional judgment, regulatory compliance, and auditability.

Suppressed actions require the same audit rigor as surfaced actions. The comfort-care suppression that prevented an alert from reaching the bedside was as consequential as the alert that did reach the bedside. Both required full Decision Traces. Both were reviewed by the clinical informatics committee. The suppressed alert's trace led to a rule improvement that would not have been possible without the evidence.

Audit infrastructure is not a post-deployment addition. Midland designed the Decision Trace schema, the append-only storage, the retention policies, and the committee review process before the system went live. Attempting to retrofit audit capabilities onto a running clinical AI system -- while maintaining patient safety and regulatory compliance -- would have been significantly more difficult and risky.

McKinsey's analysis of AI in healthcare identifies governance infrastructure as the primary differentiator between health systems that have successfully scaled clinical AI and those that have stalled after pilot deployments. The governance burden is real, but the alternative -- deploying clinical AI without deterministic decision governance, structured audit trails, and safe change management -- is a burden of a different and less manageable kind.

The patient in 4-West-12 recovered. The governance system that ensured he received a timely, well-reasoned, auditable sepsis alert was invisible to him. That invisibility is the goal. When clinical AI governance works, patients receive better care and never know why. The evidence that it worked lives in the Decision Traces -- retrievable, immutable, and ready for any reviewer who needs to understand exactly what happened and why.

Explore Memrail's Context Engineering Solution

References & Citations

  1. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan (U.S. Food and Drug Administration)

    FDA action plan for regulating AI/ML-based software as a medical device, including the total product lifecycle approach, good machine learning practices, and requirements for predetermined change control plans.

  2. AI Risk Management Framework (AI RMF 1.0) (NIST)

    NIST framework for managing AI risk across the full AI lifecycle, covering governance, mapping, measuring, and managing risk functions applicable to healthcare AI deployments.

  3. Artificial Intelligence Index Report 2025 (Stanford HAI)

    Stanford Human-Centered AI Institute annual report covering the state of AI in healthcare, including clinical deployment patterns, safety outcomes, and regulatory developments across health systems.

  4. The State of AI in Healthcare: 2025 and Beyond (McKinsey & Company)

    McKinsey analysis of AI adoption across healthcare organizations, covering clinical decision support, operational automation, and the governance maturity required for safe deployment at scale.