Healthcare & Life Sciences · Databricks
Top-10 pharma stands up a governed real-world-evidence platform
38% reduction in clinical-trial data prep · HIPAA-compliant lakehouse, day one
At a glance
Key metrics
Challenge
The situation
Real-world-evidence data lived in silos per therapeutic area. Each new study rebuilt its own cohort logic. Compliance review alone took weeks per study.
Approach
How we delivered
- 01 Built a single RWE lakehouse on Databricks with Unity Catalog as the PHI governance control point.
- 02 Codified cohort-builder patterns as reusable notebooks, ratified by regulatory and medical-affairs leaders.
- 03 Adopted OMOP common data model to enable cross-study analysis.
Architecture
Solution architecture
Claims, EHR, and registry data ingested via Auto Loader with column-level masking from the first landing. Conformed to OMOP via dbt-on-Databricks. Vector search over unstructured clinical notes. Cohort builder published as a catalog of parameterized notebooks with approval gates before external use.
Outcomes
Measured results
- 38% reduction in average study-level data prep time.
- 6 therapeutic areas onboarded within 12 months of platform GA.
- Full HIPAA control narrative approved by internal audit on first review.
Technology
Tech stack
Platform
- Databricks on Azure
- Delta Lake
- Unity Catalog
Data models
- OMOP CDM
- FHIR
Tools
- dbt
- Vector Search
- MLflow
“Every study used to reinvent cohort logic. Now they inherit it.”
Related
More outcomes
Chasing a similar outcome?
Tell us your target — cycle time, cost, risk, adoption. We'll walk you through what we've delivered closest to it.