What Building FHIR Systems Taught Me About Engineering Reviews

When I started building FHIR R4 integrations, I expected the challenge to be the data volume, or the API surface area, or the vendor fragmentation. Those were real, but they were not the hard part.

The hard part was the specification itself.

FHIR defines a Patient resource with roughly 30 top-level elements. Many of them accept constrained value sets — CodeableConcept types that must reference SNOMED-CT, ICD-10, or LOINC. A Condition.clinicalStatus field only accepts five values: active, recurrence, relapse, inactive, remission, resolved. Post those values in the wrong case, with an incorrect system URL, or with a non-canonical code, and you do not get a polite 400 with a helpful error. You get a 422 with a OperationOutcome bundle that demands you read the spec to understand what went wrong.

This is not a bug in FHIR. It is the design.

Why medical systems are strict by default

In clinical practice, ambiguity is a patient safety problem. A lab result entered with the wrong units is not just a data quality issue — it can cause a clinician to underdose or overdose. The strictness of FHIR's value sets mirrors the strictness required in clinical documentation. When a code is wrong, the system rejects it rather than inferring intent. That's the right call in healthcare.

Most software systems we build operate under weaker constraints. A REST API accepts "active" and "ACTIVE" and normalizes internally. We treat ambiguity as a UX problem to smooth over, not a correctness problem to surface. This is often the right call for consumer software. It is not the right call when the data is authoritative clinical state.

What this reveals about code review

Spending time in a system with extremely tight correctness invariants rewired how I think about pull request review.

In a standard engineering culture, a PR gets merged when it looks right. The title is descriptive, the diff is reasonable, the tests pass, a peer gave it a quick read and said LGTM. That process catches a lot of real bugs. But it also passes a certain class of error that only shows up under load, under a specific race condition, or when someone calls the function with an edge-case input three months later.

FHIR validation does not give approvals for code that looks right. It demands that the data is correct, according to a machine-readable specification. If your ImplementationGuide says Observation.code must be drawn from a specific LOINC panel, the validator will fail every Observation that isn't. It does not care if the code was close, or if the intent was obvious.

The analogy to good code review is direct. The best reviewers I have worked with review against invariants, not impressions. They ask:

What is the contract this function is supposed to honor?
What happens if the caller passes null? What about an empty array?
Is this safe to call concurrently, or is there a shared mutable state risk?
What's the failure mode when the database is slow?

These are not questions about whether the code looks clean. They are questions about correctness under constraint — the same discipline FHIR validation demands.

Safe state as a shared concept

Medicine uses the concept of a "safe state" — the minimum intervention that leaves a patient in no worse condition than before treatment. Emergency protocols are designed to reach safe state first, then optimize from there. Airway before breathing before circulation. Hemorrhage control before fracture management.

Systems engineering has an equivalent: a safe state is any configuration from which a system can be correctly recovered after a failure. You want your deployment pipeline to reach safe state — a clean rollback point — before applying any mutation that cannot be easily undone. You want your database migrations to be reversible before they touch production data.

FHIR systems exposed this parallel to me because the validation layer acts as a safe-state gate. If a resource fails validation, it is not persisted. You cannot accidentally store an invalid Observation and later discover the downstream analytics are broken. The system refuses to leave a safe state for an invalid one.

This is harder to build into general software, but not impossible. Strict schema validation at API boundaries, typed domain models that cannot represent invalid state (Rust's exhaustive matching, TypeScript's discriminated unions), and test coverage that exercises constraint boundaries — these are all tools for enforcing safe-state discipline in non-clinical systems.

The engineering review practice that came out of this

After a year of FHIR work, my PR reviews changed. I started writing invariants in comments before reviewing the diff:

// Contract: this function must return null if the resource is missing,
// never throw. Downstream code branches on null, does not try/catch.

Then I'd read the diff against those invariants rather than against my general sense of what the code should do. It catches a different class of bugs. The kind that only show up at the boundary conditions, exactly where FHIR validation fails.

The constraint is the specification. Read the specification before you read the diff. That's what FHIR taught me about code review.

What Building FHIR Systems Taught Me About Engineering Reviews

Why medical systems are strict by default

What this reveals about code review

Safe state as a shared concept

The engineering review practice that came out of this

Let's Connect

Drop a Message