Skip to main content
Data Engineering Healthcare Data Platform

Unified Clinical and Operational Data Lakehouse for a 2,400-Employee Health System

A 2,400-employee health system consolidated 15 fragmented clinical and operational data sources into a single Microsoft Fabric lakehouse — from kickoff to production in 8 weeks.

8 weeks
Kickoff to production
15
Data sources unified
$2.1M
Annual analytics value
The challenge: A 2,400-employee health system had clinical data in Epic, operational data in a legacy SQL Server warehouse, financial data in their ERP, and patient satisfaction data in spreadsheets — 15 sources total, none talking to each other. What we did: Built a Microsoft Fabric lakehouse with medallion architecture and HIPAA-compliant governance. The result: A unified data platform delivering executive dashboards, clinical quality metrics, and operational analytics — live in 8 weeks, generating $2.1M in annual analytics value.

About the Client

Industry
Company Size
2,400 employees
Geography
US-based, 6 locations
Existing Stack
Epic EHR, SQL Server, SSIS, Power BI (on-prem)
Engagement Type
Consulting + Specialist Deployment
Duration
8 weeks (pilot) + 12 weeks (expansion)

The Challenge

The health system's Chief Data Officer had a board mandate: build a unified data platform that could support clinical quality reporting, operational efficiency analysis, and the upcoming AI initiatives the board was demanding. The problem: 15 data sources across 6 locations, none sharing a common data model.

Epic held clinical data — patient records, encounters, lab results, and medication orders — but extracting analytics-ready datasets required complex Clarity/Caboodle queries that only two people in the organization understood. Operational data lived in a 12-year-old SQL Server warehouse with undocumented transformation logic. Financial data was in a separate ERP. Patient satisfaction scores arrived quarterly in Excel files. Staffing data came from a third-party workforce management system.

The CDO's previous attempt at consolidation — a traditional ETL-based data warehouse — had taken 14 months and still couldn't answer questions that crossed system boundaries. "How does staffing level correlate with patient satisfaction scores by unit?" required manual data pulls from 4 systems and 3 weeks of analyst time. The board was asking for AI-driven insights, and the data foundation wasn't there.

Compounding the challenge: HIPAA compliance required strict access controls, audit logging, and data classification. PHI couldn't flow into analytics layers without proper de-identification or role-based access. The existing approach — giving analysts direct database access — was a compliance risk the CISO had flagged repeatedly.

Our Approach

We proposed a phased approach: prove value in 8 weeks with a focused pilot, then expand. The CDO needed a quick win to maintain board confidence and budget authority.

1

Assessment & Architecture (Weeks 1-2)

Cataloged all 15 data sources, documented data volumes and refresh frequencies, mapped cross-system join keys (MRN as the golden patient identifier), and designed the Fabric lakehouse architecture using medallion pattern: Bronze (raw ingestion), Silver (cleansed, conformed), Gold (business-ready datasets). Defined HIPAA compliance framework: sensitivity labels in Microsoft Purview, row-level security on Gold layer, and automated audit logging.

2

Bronze Layer: Ingestion (Weeks 2-4)

Built 15 ingestion pipelines in Fabric Data Factory — daily full loads for dimension tables, incremental loads for transactional data. Epic data extracted via Clarity SQL views (not direct Epic database access — a critical architecture decision that kept the EHR team comfortable). SQL Server warehouse data migrated via Change Data Capture. Excel files automated through SharePointPower Automate → Fabric pipeline. Each source landed in its own Bronze lakehouse table with full load timestamps and source metadata.

3

Silver Layer: Conformance (Weeks 3-5)

Built Fabric notebooks (PySpark) to cleanse, validate, and conform data across sources. Created the unified patient master using MRN matching with fuzzy logic for edge cases (name/DOB matching when MRN was missing in legacy systems). Applied standard healthcare data models for clinical quality measures (eCQM-aligned). De-identified PHI at the Silver layer — separate Silver tables for identified (clinician access) and de-identified (analyst access) datasets.

4

Gold Layer: Analytics (Weeks 4-6)

Built Gold-layer semantic models for 4 domains: clinical quality (readmissions, mortality, infection rates), operations (bed utilization, ED throughput, OR scheduling), finance (cost per case, revenue cycle, denial rates), and patient experience (satisfaction scores by unit, provider, and service line). Each Gold dataset: certified, documented, RLS-applied, and connected to Power BI semantic models.

5

Dashboards & Governance (Weeks 5-8)

Deployed 12 Power BI dashboards across 4 domains. Configured Purview sensitivity labels, data classification, and access policies. Trained 3 internal data analysts on Fabric notebook development and Gold-layer dataset creation. Established a data governance committee with monthly review cadence. Deployed to 200+ users across 6 locations with role-based access.

Solution Architecture

The architecture followed a standard Fabric lakehouse pattern with healthcare-specific governance layers:

Source Systems: Epic (Clarity views) → SQL Server (CDC) → ERP (API) → SharePoint (Power Automate) → Workforce Management (flat file SFTP)

Ingestion: Fabric Data Factory pipelines with parameterized templates — 15 pipelines, each with error handling, retry logic, and Purview lineage tracking

Bronze → Silver → Gold: PySpark notebooks in Fabric for each transformation layer. Delta Lake format throughout for ACID transactions and time-travel. Medallion architecture with clear ownership: IT owns Bronze/Silver, business teams own Gold definitions

Governance: Microsoft Purview for sensitivity labels (PHI, PII, Confidential), automated data classification, column-level lineage, and access audit trail

Consumption: Power BI semantic models connected to Gold layer with RLS. DirectLake mode for real-time dashboard refresh without import scheduling

Results

8 weeks
Kickoff to production
vs. 14 months for the previous attempt
15 → 1
Data sources consolidated
Single source of truth for the first time
$2.1M
Annual analytics value
From operational efficiency and revenue cycle improvements
3 weeks → 10 min
Cross-system query time
"Staffing vs. satisfaction" analysis now self-service
200+
Users across 6 locations
Role-based access with HIPAA compliance
Zero
Compliance findings
Clean audit with Purview governance framework

Key Takeaways

If your organization is facing a similar challenge, here's what we learned:

Start with a focused pilot, not a boil-the-ocean strategy. We scoped the pilot to 4 clinical and operational domains — enough to prove value to the board, not so much that it became a multi-year initiative. The CDO maintained budget authority because results arrived in 8 weeks, not 18 months.

Governance isn't a phase — it's baked into every layer. We applied Purview sensitivity labels at ingestion, not after the platform was built. PHI classification, access policies, and audit logging were part of the Bronze layer design — not a post-launch compliance project.

Epic extraction strategy matters more than you think. Extracting from Clarity views (read-only reporting layer) instead of direct database access kept the EHR team comfortable. This is often the biggest political hurdle in healthcare data platforms — solve it early.

Train internal analysts on Gold layer creation. The platform's long-term value depends on business teams creating their own Gold datasets through self-service. We trained 3 analysts during the engagement — they've since created 8 additional Gold datasets without external support.

Facing a Similar Challenge?

Whether you're consolidating clinical data sources, building a Fabric lakehouse, or preparing your data platform for AI — let's talk about what it takes.