The 65/35 Rule: Why Your Healthcare AI Agents Should Be Mostly Code

The Hype Says "Agent Everything." Production Data Says Otherwise.

A recent analysis of 14 production agent workflows by Tomasz Tunguz dropped a number that every healthcare data leader should tape to their monitor: 65% of nodes in production AI agent systems run as pure deterministic code. Not LLM calls. Not fancy inference. Just good old-fashioned conditional logic, API calls, and data transformations.

If you've been in a boardroom lately where someone pitched an "agentic AI platform" that would autonomously navigate clinical data workflows end-to-end, that number should be both vindicating and alarming. Vindicating because you probably felt in your gut that full autonomy was oversold. Alarming because most teams are architecting their agent systems backwards — leading with the LLM and bolting on guardrails as an afterthought.

In healthcare data engineering, getting this ratio wrong isn't a cost optimization problem. It's a patient safety problem.

Why 65/35 Is the Right Mental Model

Let's break down what that 65% deterministic layer actually does in a well-architected agent workflow:

Schema validation and data contracts — When you're ingesting HL7v2 ADT feeds or FHIR bundles, you don't want an LLM "interpreting" whether a field is valid. You want strict schema enforcement.
Routing and orchestration logic — Deciding which downstream system gets a transformed payload is a lookup table problem, not a language model problem.
Retry logic, rate limiting, and error handling — These are engineering concerns with deterministic solutions. An LLM adding "creativity" to your retry strategy is the last thing you need at 2 AM when an EHR integration is failing.
PHI handling and access control — HIPAA compliance is not a probability distribution. It's a binary. Either the data is de-identified correctly or it isn't.

The remaining 35% — that's where LLM inference earns its keep. Parsing unstructured clinical notes. Classifying ambiguous diagnosis codes. Summarizing complex patient histories for care coordination. Extracting entities from faxed (yes, still faxed) referral documents. These are the tasks where language understanding genuinely outperforms rules-based approaches.

The Architecture That Actually Works

The teams shipping reliable healthcare AI agents aren't building monolithic LLM chains. They're building hybrid DAGs — directed acyclic graphs where each node is deliberately assigned to either deterministic code or LLM inference based on a simple decision framework:

Is the output space bounded? If yes, use code. A function that maps ICD-10 codes to risk categories doesn't need a language model.
Is correctness verifiable programmatically? If yes, use code for the verification layer even if the generation layer uses an LLM.
Does the task require genuine language understanding? Only then reach for inference. And even then, constrain the output with structured generation (JSON mode, function calling, enum restrictions).

In practice, this looks like dbt models handling the deterministic transformation layer, an orchestrator like Dagster or Prefect managing the DAG, and LLM calls isolated into specific nodes with typed inputs and outputs. The LLM nodes become pure functions with contracts — not autonomous agents roaming free across your clinical data warehouse.

The Token Economics Nobody Talks About

Here's where the 65/35 split pays for itself in hard dollars. Every unnecessary LLM call in a healthcare data pipeline is burning tokens on work that a SQL query or Python function could handle in microseconds for free. At scale — processing millions of claims, lab results, or clinical documents monthly — the difference between a well-partitioned hybrid architecture and an "everything through the LLM" approach is six figures annually. Easily.

But the cost argument is table stakes. The real win is latency and reliability. Deterministic nodes execute in milliseconds with predictable failure modes. LLM nodes introduce variable latency, rate limit dependencies, and the ever-present risk of model drift when your provider ships a silent update. Minimizing your LLM surface area minimizes your blast radius.

Stop Building AI Agents. Start Building AI-Augmented Pipelines.

The language we use shapes the architecture we build. When we say "AI agent," teams default to autonomy-first designs where the LLM is the brain and everything else is peripheral. When we say "AI-augmented pipeline," teams default to engineering-first designs where deterministic logic is the backbone and LLM inference is a specialized capability injected precisely where it's needed.

For healthcare data teams, the second framing isn't just more accurate — it's more auditable, more testable, and more likely to survive a compliance review. You can unit test a deterministic node. You can write data quality checks around it with dbt tests or Great Expectations. Try writing a deterministic test for an autonomous agent's "reasoning" about whether a prior authorization should be approved.

The teams that will dominate healthcare data engineering over the next two years aren't the ones with the most sophisticated LLM integrations. They're the ones with the discipline to use LLMs only where they're irreplaceable — and rock-solid engineering everywhere else. That's not a 50/50 split. The data says it's 65/35, and in healthcare, I'd argue it should skew even harder toward deterministic code.

So before your next sprint planning session, audit your agent architecture. Count the nodes. If more than 40% of them are making LLM calls, you don't have an AI-augmented pipeline. You have a liability.