There is a dimension table like this:
id | column_a | column_b |
---|---|---|
1 | val_a1 | val_b1 |
2 | val_a1 | val_b2 |
3 | val_a2 | val_b2 |
4 | val_a2 | val_b3 |
5 | val_a2 | val_b1 |
I am creating a mapping table to add a new column to the dimension like column_x, here:
column_a | column_b | column_x |
---|---|---|
val_a1 | 'Any-Value' | val_x1 |
val_a2 | val_b2 | val_x2 |
val_a2 | 'not val_b2' | val_x3 |
How to read the mapping table:
This means for all the rows with vol_a1 in column_a and any value in column_b: column_x will have val_x1.
This means for the rows with vol_a2 in column_a and specifically val_b2 in column_b: column_x will have val_x2.
This means for the rows with vol_a2 in column_a and specifically NOT val_b2 in column_b: column_x will have val_x3.
The output will look like:
id | column_a | column_b | column_x |
---|---|---|---|
1 | val_a1 | val_b1 | val_x1 |
2 | val_a1 | val_b2 | val_x1 |
3 | val_a2 | val_b2 | val_x2 |
4 | val_a2 | val_b3 | val_x3 |
4 | val_a2 | val_b1 | val_x3 |
Can I do it in a single join? I can obviously break it in multiple CTEs and can do it. What would the most optimal way to do it, if the dimension table is big?
I thought to change the mapping table into this and add priorities with the help of IF
conditions in the ON
clause with OR
Conditions:
column_a | column_b | column_x |
---|---|---|
val_a1 | 'Any-Value' | val_x1 |
val_a2 | val_b2 | val_x2 |
val_a2 | 'Any-Value' | val_x3 |
But it's giving me duplicates.
I can use SLQ or python to do this.