dbt Is Eating the Feature Store — And Healthcare AI Teams Should Pay Attention

The Two-Stack Problem

Walk into any mid-size health system's data team and you'll find the same dysfunction: one group running dbt models to power dashboards, another group cobbling together feature pipelines in Python to feed ML models. Same source data. Different transformations. Zero shared lineage. Two sets of business logic that inevitably drift apart.

This was always a bad architecture. Now it's an indefensible one.

dbt Labs' latest push into AI-native data pipelines — including tight integration with Snowflake's feature store and support for incremental feature computation — isn't just a product announcement. It's the final nail in the coffin of the standalone feature store as a category.

Feature Engineering Was Always Just Data Transformation

The ML industry spent five years convincing itself that "feature engineering" was fundamentally different from "data transformation." It wasn't. Computing a 30-day rolling average of A1C values is the same operation whether it feeds a Looker dashboard or a readmission risk model. The SQL is identical. The data quality requirements are identical. The governance requirements are — you guessed it — identical.

What was different was the tooling. Analytics engineers used dbt. ML engineers used Feast, Tecton, or bespoke Python pipelines. The result: duplicated logic, inconsistent definitions, and a governance gap wide enough to drive a HIPAA violation through.

dbt's evolution into a feature engineering layer eliminates this artificial separation. With version-controlled SQL models, incremental materialization strategies, and native integration with Snowflake's feature store, teams can now define features once and serve them to both analytical and ML workloads.

Why This Matters More in Healthcare

In most industries, duplicated business logic is a cost problem. In healthcare, it's a patient safety problem.

Consider a clinical risk model that uses a "recent hospitalization" feature. If your analytics team defines "recent" as 30 days and your ML team defines it as 90 days, your model isn't just inaccurate — it's making clinical decisions on a foundation that can't be audited consistently. When CMS or a payor asks how your model works, you need one answer, not two.

A unified dbt-based feature layer gives you:

Single source of truth for business logic. One definition of "chronic condition," "recent encounter," or "medication adherence" that serves dashboards, reports, and ML models equally.
Built-in lineage and documentation. dbt's DAG gives you model-level lineage out of the box. When a regulator asks how a feature was computed, you point to a version-controlled SQL file with a full audit trail.
Incremental computation that actually works. Snowflake dynamic tables combined with dbt's incremental materialization mean your features stay fresh without full-table recomputes — critical when you're processing millions of claims records daily.
Testable, governed transformations. dbt's testing framework applies directly to features. Schema tests, custom data quality checks, and freshness monitoring work the same whether the downstream consumer is a dashboard or an XGBoost model.

The Architecture Shift

Here's what the new stack looks like for a healthcare AI team that gets this right:

Raw clinical data lands in Snowflake — EHR extracts, claims feeds, ADT messages, lab results. dbt transforms it through staging, intermediate, and mart layers, exactly as it does today. But now, a subset of those marts are explicitly designated as feature models — tagged, documented, and materialized as Snowflake dynamic tables that feed both the BI layer and the ML training and serving pipeline.

No separate feature store infrastructure. No Python scripts duplicating SQL logic. No drift between what your dashboard says and what your model uses. The Snowflake feature store becomes a consumption layer on top of dbt-managed tables, not a parallel universe of transformations.

This is not theoretical. The latest dbt release includes an immutable_where configuration for Snowflake dynamic tables — a clear signal that dbt is optimizing for the append-heavy, time-series patterns that dominate clinical and claims data.

What Your Team Needs to Do Now

If you're running a healthcare data team with separate analytics and ML engineering stacks, here's the uncomfortable truth: you're maintaining two systems that do the same thing, and neither one has complete governance coverage.

Start by auditing your ML feature definitions against your dbt models. I guarantee you'll find drift — different join logic, different filter criteria, different definitions of the same clinical concepts. That drift is technical debt with regulatory exposure.

Then start migrating your feature definitions into dbt. Not all at once. Start with the features that overlap most with your existing analytics models. Tag them. Test them. Serve them through Snowflake's feature store integration.

The teams that unify their transformation layer now will ship clinical AI faster, with better governance, at lower infrastructure cost. The teams that don't will spend the next two years explaining to auditors why their model's definition of "diabetic patient" doesn't match their quality dashboard's definition.

The feature store category had a good run. dbt just absorbed it. The only question is whether your team adapts before your next model audit.