0

There is a dimension table like this:

id column_a column_b
1 val_a1 val_b1
2 val_a1 val_b2
3 val_a2 val_b2
4 val_a2 val_b3
5 val_a2 val_b1

I am creating a mapping table to add a new column to the dimension like column_x, here:

column_a column_b column_x
val_a1 'Any-Value' val_x1
val_a2 val_b2 val_x2
val_a2 'not val_b2' val_x3

How to read the mapping table:

This means for all the rows with vol_a1 in column_a and any value in column_b: column_x will have val_x1.

This means for the rows with vol_a2 in column_a and specifically val_b2 in column_b: column_x will have val_x2.

This means for the rows with vol_a2 in column_a and specifically NOT val_b2 in column_b: column_x will have val_x3.

The output will look like:

id column_a column_b column_x
1 val_a1 val_b1 val_x1
2 val_a1 val_b2 val_x1
3 val_a2 val_b2 val_x2
4 val_a2 val_b3 val_x3
4 val_a2 val_b1 val_x3

Can I do it in a single join? I can obviously break it in multiple CTEs and can do it. What would the most optimal way to do it, if the dimension table is big?

I thought to change the mapping table into this and add priorities with the help of IF conditions in the ON clause with OR Conditions:

column_a column_b column_x
val_a1 'Any-Value' val_x1
val_a2 val_b2 val_x2
val_a2 'Any-Value' val_x3

But it's giving me duplicates.

I can use SLQ or python to do this.

Deep Kalra
  • 1,418
  • 9
  • 27

2 Answers2

1

This might do. Pls. note that there might be more than one mapping record joined to certain dimension records. Having OR in join conditions is not very performant btw.

select t1.id, t1.column_a, t1.column_b, t2.column_x
from _dimension t1 left outer join _mapping t2
on
  t1.column_a = t2.column_a and t2.column_b = 'Any-Value' OR
  t1.column_a = t2.column_a and t1.column_b = t2.column_b OR
  t1.column_a = t2.column_a and t2.column_b ~* '^not ' 
   and t1.column_b <> regex_replace(t2.column_b, '^not (.+)$', '\1', 'i');

About priorities, I would suggest that you add an explicit priority_level integer column in your mapping table. Then the query will look like this:

select distinct on (t1.id)
    t1.id, t1.column_a, t1.column_b, t2.column_x
from _dimension t1 left outer join _mapping t2
on
  t1.column_a = t2.column_a and t2.column_b = 'Any-Value' OR
  t1.column_a = t2.column_a and t1.column_b = t2.column_b OR
  t1.column_a = t2.column_a and t2.column_b ~* '^not '
   and t1.column_b <> regex_replace(t2.column_b, '^not (.+)$', '\1', 'i');
order by t1.id, t2.priority_level;

It may be a good idea to use null instead of 'Any-Value' and '!val_b' instead of 'not val_b' in the mapping table for simplicity.

Stefanov.sm
  • 11,215
  • 2
  • 21
  • 21
  • Yes, I can change the mapping table. I tried almost similar, but I don't understand how this will take care of the overlap part for 'Val_x2' and 'Val_x3' – Deep Kalra Jun 07 '21 at 10:36
  • [DISTINCT ON](https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by) (t1.id) will do that for you and ORDER BY second expression will determine which record will be picked (always the first one in the group). – Stefanov.sm Jun 07 '21 at 10:55
1

Can you use a case when statement instead of a join? So your query / view would be:

Select id,  
column_a,  
column_b,  
case when column_a = 'val_a1' then 'val_x1' 
when column_a = 'val_a2' and column_b = 'val_b2' then 'val_x2' 
when column_a = 'val_a2' and column_b <>'val_b2' then 'val_x3' 
else -- add your other conditions here 
end as column_x from table

In case you need to join, you can do so with a single join on SQL server using case when and an additional column on your mapping table for the values of column_b that you want to exclude..

Let's say this is your new mapping table:

| column_a | column_b | column_b_exclude | column_x |
-----------|----------|------------------|----------|
| val_a1   | NULL     | NULL             | val_x1   |
| val_a2   | val_b2   | NULL             | val_x2   |
| val_a2   | NULL     | val_b2           | val_x3   |

Your join would look like

select * from 
table t 
inner join mapping m 
on t.column_a = m.column_a and 
case when m.column_b is null then t.column_b else m.column_b end = t.column_b and case when m.column_b_exclude is not null then m.column_b_exclude else 'x' end <> t.column_b
Oulke
  • 26
  • 2