Having spent some time working on data warehousing, I have created both ETL (extract transform load) and ELT (extract load transform) processes. It seems that ELT is a newer approach to populating data warehouses that can more easily take advantage of cluster computing resources. I would like to hear what other people think the advantages are of ETL and ELT over each other and when you should use one or the other.
4 Answers
So after having played thoroughly with both ETL and ELT, I have come to the conclusion that you should avoid ELT at all costs. ETL prepares the data for your warehouse before you actually load it in. ELT however loads the raw data into the warehouse and you transform it in place. That is problematic if you have a busy data warehouse. If there is a reporting query running on a table that you are attempt to update, your query will get blocked. Consequently, it is possible for reporting queries to hold up or block updates.
Now some of you might say reporting queries do not need to block an update and you can set your isolation level to allow for dirty reads. Reporting queries however are not generally executed by software engineers. They are executed by business users so you can't rely on them to set their isolation levels properly. As well, not all reports can tolerate dirty reads.
There are cases where ELT can work however by introducing it to your data warehouse is dangerous and consequently, I recommend for your sanity and for maintainability, avoid it.

- 9,164
- 7
- 40
- 39
-
4I disagree with this. I think there's a place for both approaches. I personally prefer ELT if situation allows. Most data warehouses are loaded nightly during their maintenance windows... and if you're referring to data warehouses in a follow-the-sun environment, you'll run into issues regardless of which approach you go with (may need to get a little creative). I find I have more control with ELT.. not to mention your ability to retain raw data for archival purposes. – sam yi Dec 31 '13 at 06:43
-
One more thing I wanted to add... I think ELT requires more experienced database/sql resources to maintain it since bulk of the work is taking place inside the database. This, in a roundabout way, is a similar question as the application vs database question... http://stackoverflow.com/questions/119540/business-logic-database-or-application-layer – sam yi Dec 31 '13 at 06:52
-
There's no right answer.... – sam yi Dec 31 '13 at 06:54
-
1....and that's why the question has been closed! – Monica Heddneck Mar 03 '16 at 22:08
-
Ideally, your reports should be out of Reporting Data Marts rather than Data Warehouse -- which solves your blocking/dirty read problem. – ranit.b Feb 18 '21 at 12:36
Which is better is hard to answer -- depends on the problem.
I prefer multi-step ETL -- ECCD (Extract, Clean, Conform, Deliver) whenever possible. I also keep intermediate csv files after each extract, clean, and conform step; takes some disk space, but is quite useful. Whenever DW has to be re-loaded due to bugs in etl, or DW schema changes, there is no need to query source systems again -- it is already in flat files. It is also quite convenient to be able to grep, sed and awk through flat files in the staging area when needed. In the case when there are several source systems which feed into the same DW, only extract steps have to be developed (and maintained) for each of the source systems -- clean, conform, and deliver steps are all common.

- 21,891
- 3
- 47
- 71
I use both. It's simply a matter of convenience and functionality. It all depends on the case. Sometimes I do TEL - i.e. the transform is done in the source database (in a stored procedure or view) and then extracted and loaded directly.

- 88,164
- 40
- 182
- 265
I prefer ELT. One can say it is against the Norm. It does require a change in mentality and design approach against traditional methods. But it utilizes Existing Hardware and skill sets, further reducing the cost and risk in the development process.
If we want to ensure referential integrity in ETL approach, then data must be downloaded from target to ETL server(Engine). But we don't need to do it in ETL approach.
To get the best from an ELT approach requires an open mind.

- 37
- 1
- 6