In This Article
- The Industry Shift: Why ELT Became the Default
- Clear Definitions: ETL vs ELT vs ETLT
- The Decision Framework: 8 Criteria
- When ETL Is Still the Right Choice
- When ELT Is the Clear Winner
- Platform-Specific Guidance: Fabric, Snowflake, Databricks
- Migrating from ETL to ELT: Patterns and Pitfalls
- The ETLT Pattern: When You Need Both
- Go Deeper
The Industry Shift: Why ELT Became the Default
For 25 years, ETL was the only pattern. Data warehouses (Teradata, Oracle, SQL Server) had limited compute — you couldn't afford to load raw data and transform it inside the warehouse because warehouse compute was expensive and finite. Transformation happened outside the warehouse, in dedicated ETL engines (Informatica PowerCenter, IBM DataStage, Microsoft SSIS), where compute was cheaper and transformations could run without competing with analytical queries.
Cloud changed the economics. Microsoft Fabric, Snowflake, Databricks, and BigQuery offer elastic compute — you pay for what you use, and you can scale up for transformation workloads without affecting analytical query performance. The economic argument for transforming outside the warehouse evaporated. If the warehouse can scale to handle both transformation and queries, why maintain a separate transformation engine? This is why ELT — load raw data first, transform inside the platform — became the default for new cloud implementations.
But "default" doesn't mean "always." ETL remains the right choice in specific scenarios — and making the wrong choice produces either unnecessary infrastructure cost (ETL when ELT would suffice) or governance risk (ELT when ETL's pre-load quality gates are needed). This framework helps you choose correctly.
Clear Definitions: ETL vs ELT vs ETLT
| Pattern | Data Flow | Where Transformation Happens | Raw Data Preserved? |
|---|---|---|---|
| ETL | Source → Extract → Transform (staging) → Load (warehouse) | Dedicated ETL engine (Informatica, SSIS, Talend) | No — only transformed data reaches the warehouse |
| ELT | Source → Extract → Load (raw zone) → Transform (warehouse/lakehouse) | Target platform (SQL, Spark, dbt in Fabric/Snowflake/Databricks) | Yes — raw data preserved in landing zone |
| ETLT | Source → Extract → Light Transform → Load (raw) → Heavy Transform (warehouse) | Both: light transforms before load, heavy transforms after | Partially — pre-processed data loaded, not fully raw |
ELT's key advantage: raw data preservation. When you load raw data before transforming, you can always re-transform from the original source. A bug in transformation logic doesn't require re-extracting from the source system — you re-run the transform on the raw data already in the data lake. ETL discards the raw data after transformation, so a transformation bug requires re-extraction from the source — which may not be possible for historical data.
The Decision Framework: 8 Criteria
| # | Criterion | Favors ETL | Favors ELT |
|---|---|---|---|
| 1 | Platform compute | On-premises warehouse with fixed compute | Cloud platform with elastic compute |
| 2 | Data governance | PII must be masked/removed before loading | Raw zone access controlled; transform applies masking |
| 3 | Transformation complexity | Complex procedural logic, external APIs mid-transform | SQL-expressible transformations, set-based operations |
| 4 | Data volume | Small-medium (GBs) — ETL engine handles efficiently | Large (TBs+) — platform's distributed compute scales |
| 5 | Raw data retention | Not required — transformed data is the record | Required — audit trail, re-processing, ML training |
| 6 | Team skills | Team skilled in ETL tools (Informatica, SSIS) | Team skilled in SQL, Spark, dbt, platform-native tools |
| 7 | Source system impact | Re-extraction acceptable for reprocessing | Re-extraction undesirable — raw data enables replay |
| 8 | Regulatory requirements | Data must not exist in raw form in the analytical platform | Regulatory audit requires raw data preservation |
Score each criterion for your environment. If 5+ criteria favor ELT, implement ELT. If 5+ favor ETL, implement ETL. If it's mixed (3-4 each), consider the ETLT hybrid pattern.
When ETL Is Still the Right Choice
ETL isn't obsolete — it's specific. Three scenarios where ETL remains the correct architecture:
Scenario 1: PII must never reach the analytical platform in raw form. Healthcare organizations under HIPAA, financial institutions under GDPR, and government agencies with classified data may require that PII is tokenized, masked, or removed before data enters the analytical environment. ETL's pre-load transformation is the enforcement point — data is clean before it touches the warehouse. ELT can achieve the same result with a raw zone that has restricted access, but the governance argument for "PII never enters" is stronger with ETL.
Scenario 2: Transformations require procedural logic or external API calls. Some transformations can't be expressed in SQL. Geocoding addresses requires an external API call. Natural language processing requires a Python library. Complex business rules with 50+ conditional branches are cleaner in procedural code than in SQL CASE statements. ETL engines (Informatica, custom Python) handle these naturally. ELT platforms can handle them via stored procedures or UDFs, but the developer experience is better in an ETL engine designed for procedural logic.
Scenario 3: On-premises data warehouse with fixed compute. If the target is a traditional on-premises warehouse (SQL Server, Oracle, Teradata) with fixed compute capacity, loading raw data and transforming inside the warehouse competes with analytical queries for the same resources. ETL offloads transformation to a separate engine, preserving warehouse compute for queries. This scenario is declining as organizations migrate to cloud — but for the 40% of enterprises still running on-premises warehouses, ETL remains practical.
When ELT Is the Clear Winner
Cloud-native data platform. Fabric, Snowflake, Databricks, and BigQuery all optimize for ELT. Elastic compute means transformation workloads scale independently. Raw data preserved in the landing zone enables re-processing without re-extraction. Platform-native SQL and Spark transformations use the engine's optimization — often 5-10x faster than external ETL for the same data volume.
Data science and ML workloads. ML models need raw or lightly processed data — not the aggregated, business-rule-applied output of ETL. ELT preserves the raw data that data scientists need for feature engineering, model training, and experimentation. An ETL pipeline that aggregates transactions to daily summaries destroys the transaction-level signal that ML models require.
Agile analytics. ELT with tools like dbt enables version-controlled, testable, modular transformations — similar to software engineering practices. Transformations are SQL files in a Git repository, tested automatically, deployed through CI/CD. This development velocity is difficult to achieve with traditional ETL tools that use proprietary visual interfaces.
For any new cloud data platform implementation, ELT is the default. Choose ETL only when specific criteria (PII governance, procedural complexity, on-premises constraints) require it. The ETL consulting engagement should evaluate which pattern is right — not assume the answer.
Platform-Specific Guidance: Fabric, Snowflake, Databricks
Microsoft Fabric
Fabric is ELT-native. OneLake stores raw data in the landing zone (lakehouse). Dataflows Gen2 handles light extraction and cleansing. Notebooks (Spark/Python) and SQL analytics endpoints handle heavy transformation. dbt integration via the SQL endpoint brings software engineering practices to transformation. The Fabric architecture: land raw data in lakehouse → transform with Spark/SQL → serve through warehouse and Power BI semantic models. ETL patterns (Informatica, SSIS) still work with Fabric but add unnecessary complexity — Fabric's native tools handle the ELT workflow end-to-end.
Snowflake
Snowflake's architecture (separate storage and compute) is designed for ELT. External stages (S3, ADLS, GCS) store raw data. Snowpipe handles continuous ingestion. Virtual warehouses scale compute for transformation independently. dbt is the standard transformation tool on Snowflake — SQL transformations in version-controlled projects. Snowflake's Time Travel feature (query historical data up to 90 days) reduces the need for raw data preservation because the platform retains historical states natively.
Databricks
Databricks Delta Lake provides the lakehouse foundation for ELT. Raw data lands in the bronze layer (raw), transforms in the silver layer (cleansed), and serves from the gold layer (business-ready). The medallion architecture is ELT by design — each layer transforms from the previous layer using Spark notebooks or SQL. Delta Lake's ACID transactions, schema enforcement, and time travel provide the governance that makes ELT trustworthy at scale.
Migrating from ETL to ELT: Patterns and Pitfalls
For organizations migrating from legacy ETL (SSIS, Informatica, DataStage) to cloud ELT, the migration follows three phases:
Lift and Shift (Weeks 1-4)
Move existing ETL jobs to the cloud platform — Azure Data Factory can run SSIS packages in Azure, Informatica IDMC runs in the cloud. This provides a cloud-hosted ETL environment while planning the ELT migration. Not the end state, but a stepping stone that moves off on-premises hardware immediately.
Parallel Build (Months 2-4)
Rebuild transformations in the target ELT framework (SQL/Spark/dbt) — starting with the simplest, highest-volume pipelines. Run old ETL and new ELT in parallel, comparing outputs. The parallel run validates that the ELT produces identical results before decommissioning the ETL.
Cutover and Decommission (Months 5-6)
Once parallel runs confirm parity, cut over to ELT for each pipeline. Decommission the legacy ETL engine progressively — don't maintain both long-term, as dual maintenance doubles operational overhead.
The ETLT Pattern: When You Need Both
ETLT applies light transformation before loading (PII masking, schema standardization, data type casting) and heavy transformation after loading (business logic, aggregation, dimensional modeling). This hybrid pattern is increasingly common because it combines the governance benefits of pre-load processing with the power and flexibility of in-platform transformation.
Example ETLT pipeline: Extract from source → mask SSN and credit card numbers (light transform) → load masked data into data lake raw zone → transform in Spark/SQL for dimensional model → serve through Power BI. The PII never exists in raw form in the analytical platform. The dimensional modeling uses the platform's distributed compute. Both requirements met.
The Xylity Approach
We evaluate ETL vs ELT as part of every data integration engagement — using the 8-criteria framework, not platform vendor marketing. Our ETL developers and data engineers implement whichever pattern (or hybrid) your governance, platform, and use case requirements demand. For organizations migrating from legacy ETL to cloud ELT, we run the parallel-build-and-cutover approach that validates parity before decommissioning.
Go Deeper
Continue building your understanding with these related resources from our consulting practice.
ETL or ELT? The Framework Decides
Eight criteria, platform-specific guidance, migration pathway. The decision framework that matches architecture to your reality — not vendor marketing.
Start Your Integration Architecture Engagement →