The Case for AI Agents in Clinical Triage

Clinical triage is one of the oldest decision-support problems in medicine. Given a patient presenting to an emergency department, determine their acuity — how urgently they need to be seen — and route them accordingly. Do it in under three minutes. Do it correctly for dozens of patients simultaneously, across every shift, every day.

The two dominant systems — the Manchester Triage System (MTS) and the Emergency Severity Index (ESI) — are elegant in a very specific way. They are rule-based state machines.

MTS works like this: a presenting complaint (chest pain, headache, shortness of breath) maps to a flowchart. The flowchart has discriminator nodes — clinical findings that branch the path. Is the pain pleuritic? Is the respiratory rate above 30? Each discriminator drives you toward one of five priority categories. ESI is similar: five levels, each defined by explicit criteria about vital sign instability, resource consumption, and chief complaint severity.

Both systems are deterministic given complete inputs. That is their strength — they remove clinician-to-clinician variability and produce consistent, auditable decisions. They also have a known weakness: the inputs are never complete.

Where the state machine breaks

A patient presents with a headache. MTS routes you to the headache flowchart. The discriminator asks: is this the "worst headache of their life"? The answer requires the patient to accurately assess and report their own pain history, under stress, in a noisy triage bay, in a language that may not be their first.

A patient presents with vague abdominal discomfort. ESI asks about vital signs and resource consumption expectations. Neither captures the subtlety that elderly patients with serious pathology — ruptured aortic aneurysm, mesenteric ischemia — frequently present with diffuse, low-intensity pain. The discriminator says low acuity. The pathology says otherwise.

The failure mode is not that the protocol is wrong. The failure mode is that the protocol can only operate on its declared inputs. Anything outside those inputs — the patient's posture, their skin color, the quality of their distress, their affect — is invisible to the state machine.

This is where AI agents become interesting.

Why LLMs map onto triage naturally

When you structure a clinical encounter as a conversation — the agent asks about onset, quality, radiation, associated symptoms, severity — and reason over the answers, you get something that looks like a structured triage interview.

An agent with tool access can conduct a structured history following a protocol's discriminator logic, retrieve current vital signs from the EHR, access prior visit history, and produce a priority recommendation with a stated rationale. This covers what the state machine covers, plus the unstructured signals that fall through the cracks. The agent can note that a prior visit for the "same" complaint produced a normal workup, or that chronic pain conditions make the severity self-report unreliable.

The failure modes a clinician sees

Here is what the enthusiasm for AI triage gets wrong, and what only shows up when you have actually practiced emergency medicine.

The confident wrong answer. LLMs can be wrong with high expressed confidence. In triage, a confidently wrong answer in the direction of undertriage — assigning a lower priority to a higher-acuity patient — can cause a preventable death. The model that says "this headache sounds tension-type, priority 3" when it's a subarachnoid hemorrhage is not a helpful tool. It is a liability generator.

The missing vital sign problem. Triage protocols are calibrated to vital signs. A patient in early septic shock will look relatively well before the vital signs declare the emergency. If the agent reasons from the history without the vital signs, or if the vital signs are stale or absent from the EHR, the agent is reasoning in the dark and may not know it.

Social engineering. A patient who knows how triage works can describe symptoms designed to get a higher-priority assignment. Humans are semi-resistant to this because experienced triage nurses develop gestalt — a non-verbal read of the patient. An agent has no gestalt.

The language and cognitive impairment gap. Triage interviews require the patient to be a reliable historian. Patients who are cognitively impaired, intoxicated, psychotic, or whose first language does not match the agent's are unreliable historians. The agent must detect this unreliability and escalate, or it will generate low-quality reasoning from low-quality inputs.

The right architecture

The appropriate design is not autonomous triage. It is augmented triage.

The agent conducts a structured pre-triage interview and produces a preliminary acuity recommendation with explicit confidence and a stated rationale. A triage nurse receives this summary alongside the patient, reviews it in thirty seconds, and makes the final call.

The confidence threshold matters. An agent recommending priority 3 with 0.6 confidence should route differently than one with 0.95 confidence. Low-confidence outputs trigger immediate nurse assessment rather than joining the queue.

Triage is a safety-critical system. The design principle for safety-critical systems is defense in depth — multiple layers with different failure modes, so no single failure causes harm. The agent is one layer. The nurse is another. The protocol is a third.

Build the layers. Ship the tool. But do not remove the nurse from the loop until you have outcome data to justify it. That data does not yet exist.

The Case for AI Agents in Clinical Triage

Where the state machine breaks

Why LLMs map onto triage naturally

The failure modes a clinician sees

The right architecture

Let's Connect

Drop a Message