I have a typical fact table in the data warehouse - the table has some surrogate keys and measures. In data, the warehouse is also lookup tables - small dimensions inside which history is not maintained. Just surrogate key, business key, and one or two attributes. During fact table load, surrogate keys are taken from lookup tables (joins are based on business keys). So basically on some stage of fact loading, we have business keys inside fact which are used to grab surrogate keys from lookup tables, after that operation, business keys are gone, and later e.g. in data mart (for reporting purposes) we can join lookups with fact using surrogate keys only for some attributes. Until now the process was rather straightforward because we used only one business key to set up attribute value.
But now there are cases where we should use 3 or even more.
E.g. these are conditions:
COLUMN_1 = 'ABC'
AND COLUMN_2 <> 'Z'
AND COLUMN_3 IN ('1', '2')
COLUMN_1 = 'ABC'
AND COLUMN_2 <> 'Z'
AND COLUMN_3 IN ('3', '4', '5')
COLUMN_1 <> 'ABC' OR COLUMN_2 = 'Z'
COLUMN_1, COLUMN_2, COLUMN_3 are business keys that are presented in the fact table. For sure above logic will be applied in the look-up load so let's say that we will have 3 surrogate keys: 1, 2, and 3.
But the main questions is - which approach will be better for the fact table:
- To duplicate logic from look up load to fact load? Better performance but this is worse for maintenance (if this needs to be changed in the future, the change will need to be applied in two places), and also surrogate keys will need to be hardcoded.
- Put above conditions in join condition? Obviously will be better for maintenance, however much worse for performance (fact table has around 10 000 000 rows inserted daily).
- Or maybe there is another solution? Combining the above conditions in the source system also is not an option.
All suggestions are welcome.