1

I would like to join two tables and if columns do not match then another condition should be applied for joining the tables. Tables should join based on two products and level columns. If products match but level doesn’t match, then matching order should be Low > Med > High. For example, in Scenario 1, table 2 should match with Level Med (row 2) of table 1. If there is no “Med” level, then will match with “High” level. For example, in Scenario 2, table 2 should match with Level High (row 1) of table 1. If products do not match or level is NA then Value should be 0.

Here is the different scenarios of my problem. Table 1 is the master table and Table 2 can have different scenarios.

Table 1

ID  Product_1    Product_2    Level    Value
1   C            D            High     10
2   A            B            Med      11
3   A            B            High     12
4   B            C            Med      13
5   B            C            High     9

Different scenarios of Table 2

Scenario 1, Table 2

Product_1    Product_2    Level
A            B            Low

Expected Output

Product_1    Product_2    Level    Value
A            B            Med      11

Scenario 2, Table 2

Product_1    Product_2    Level
C            D            Low

Expected Output

Product_1    Product_2    Level    Value
C            D            High      10

Scenario 3, Table 2

Product_1    Product_2    Level
A            B            Med

Expected Output

Product_1    Product_2    Level    Value
A            B            Med      11

Scenario 4, Table 2

Product_1    Product_2    Level
M            N            High

Expected Output

Product_1    Product_2    Level    Value
M            N            High      0

Scenario 5, Table 2

Product_1    Product_2    Level
A            B            NA

Expected Output

Product_1    Product_2    Level    Value
A            B            NA       0

The code I have tried, It basically matches, if not then return "0"

WITH `project.dataset.Table1` AS (
 SELECT 1 ID, 'C' Product_1, 'D' Product_2, 'High' Level, 10 Value UNION ALL
 SELECT 2, 'A', 'B', 'Med', 11 UNION ALL
 SELECT 3, 'A', 'B', 'High', 12 UNION ALL
 SELECT 4, 'B', 'C', 'Med', 13 UNION ALL
 SELECT 5, 'B', 'C', 'High', 9 
),

`project.dataset.Table2` AS (
 SELECT 2 ID, 'A' Product_1, 'B' Product_2, 'Low' Level 
 ),
 
-- The Above table idea is taken from Mikhail's solution

get_values as (
select 
     t1.ID
    ,t2.Product_1
    ,t2.Product_2
    ,t2.Level      
    ,t1.Value
    
FROM `project.dataset.Table1` AS t1
join `project.dataset.Table2` t2 using(Product_1,Product_2,Level)
)

select t2.ID
    ,t2.Product_1
    ,t2.Product_2
    ,t2.Level
    ,IFNULL(Value, 0) as Value

from `project.dataset.Table2` t2
left join get_values gv on t2.ID = gv.ID

Thanks

moeen
  • 33
  • 6
  • 1
    Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image. Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL that includes DBMS & DDL (including constraints & indexes) & input as code formatted as a table. [ask] – philipxy Jul 09 '20 at 19:45
  • Please show partial/related queries you can do & relate to your goal. Ask re an error while putting that goal on hold. Please don't ask us to write your code. PS Here is how to characterize DB tables (so you can reason & communicate about them): **When giving a business relation(ship)/association or table (base or query result), say what a row in it states about the business situation in terms of its column values.** PS Relational querying does not use "conditional joins". A CROSS/INNER JOIN first gives all combinations of an input row from each table. ON/WHERE keep some. Which do you want? – philipxy Jul 09 '20 at 19:55
  • I am just a user who is trying to help you. I can't see how you are checking against my comments or its links. Eg we can't cut & paste from your post & run your attempts or ours. Eg you don't say what you think your queries do & how that relates to your goal; you just dump wrong code. (Also I put something in boldface & said it was fundamental in helping but you haven't acted on it. And I asked which rows of cross join you want & you didn't act. These aren't obligatory things, I'm just showing you how to get unstuck.) If you don't understand something & want to, comment about it. – philipxy Jul 09 '20 at 22:42
  • As I was trying to improve the question as first as possible based on your comment, I didn't check the dummy code on the dummy data, hence there was mistake in the code. But I mentioned what my query does and that should give the idea. I don't understand why I need to say the business relation with column value in the BOLD lines!! OR I just didn't understood what do you meant. ANYWAY..I got the solution. Thanks for your comments to improve my question. – moeen Jul 10 '20 at 00:15
  • I just told you that you don't need to follow the boldface ("aren't obligatory") but that it is fundamental & will help you. (Because it leads directly to corresponding SQL code. [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097)) "that should give the idea" is an unjustified belief. PS Please when you give any input tables make them initialization code in tabular form that can be cut & pasted, not just read by people. Good luck. – philipxy Jul 10 '20 at 00:25

2 Answers2

2

Below is for BigQuery Standard SQL

#standardSQL
SELECT Scenario, Product_1, Product_2, candidates.Level, candidates.Value 
FROM (
  SELECT Scenario, Product_1, Product_2, t2.Level, 
    ARRAY_AGG(
      STRUCT(IF(t2.Level = 'NA', 'NA', IFNULL(t1.Level, t2.Level)) AS Level, IF(Value IS NULL OR t2.Level = 'NA', 0, Value) AS Value)
      ORDER BY CASE t2.Level
        WHEN 'Low' THEN CASE t1.Level WHEN 'Low' THEN 1 WHEN 'Med' THEN 2 WHEN 'High' THEN 3 END
        WHEN 'Med' THEN CASE t1.Level WHEN 'Low' THEN 77 WHEN 'Med' THEN 1 WHEN 'High' THEN 2 END
        WHEN 'High' THEN CASE t1.Level WHEN 'Low' THEN 77 WHEN 'Med' THEN 66 WHEN 'High' THEN 1 END
        ELSE 0
      END      
    )[OFFSET(0)] candidates
  FROM `project.dataset.table2` t2
  LEFT JOIN `project.dataset.table1` t1
  USING(Product_1, Product_2)
  GROUP BY Scenario, Product_1, Product_2, Level  
)  

If to apply to sample data from your question as in below example

#standardSQL
WITH `project.dataset.table1` AS (
  SELECT 'C' Product_1, 'D' Product_2, 'High' Level, 10 Value UNION ALL
  SELECT 'A', 'B', 'Med', 11 UNION ALL
  SELECT 'A', 'B', 'High', 12 UNION ALL
  SELECT 'B', 'C', 'Med', 13 UNION ALL
  SELECT 'B', 'C', 'High', 9 
),`project.dataset.table2` AS (
  SELECT 1 Scenario, 'A' Product_1, 'B' Product_2, 'Low' Level UNION ALL
  SELECT 2, 'C', 'D', 'Low' UNION ALL
  SELECT 3, 'A', 'B', 'Med' UNION ALL
  SELECT 4, 'M', 'N', 'High' UNION ALL
  SELECT 5, 'A', 'B', 'NA' 
)
SELECT Scenario, Product_1, Product_2, candidates.Level, candidates.Value 
FROM (
  SELECT Scenario, Product_1, Product_2, t2.Level, 
    ARRAY_AGG(
      STRUCT(IF(t2.Level = 'NA', 'NA', IFNULL(t1.Level, t2.Level)) AS Level, IF(Value IS NULL OR t2.Level = 'NA', 0, Value) AS Value)
      ORDER BY CASE t2.Level
        WHEN 'Low' THEN CASE t1.Level WHEN 'Low' THEN 1 WHEN 'Med' THEN 2 WHEN 'High' THEN 3 END
        WHEN 'Med' THEN CASE t1.Level WHEN 'Low' THEN 77 WHEN 'Med' THEN 1 WHEN 'High' THEN 2 END
        WHEN 'High' THEN CASE t1.Level WHEN 'Low' THEN 77 WHEN 'Med' THEN 66 WHEN 'High' THEN 1 END
        ELSE 0
      END      
    )[OFFSET(0)] candidates
  FROM `project.dataset.table2` t2
  LEFT JOIN `project.dataset.table1` t1
  USING(Product_1, Product_2)
  GROUP BY Scenario, Product_1, Product_2, Level  
)

the output is

Row Scenario    Product_1   Product_2   Level   Value    
1   1           A           B           Med     11   
2   2           C           D           High    10   
3   3           A           B           Med     11   
4   4           M           N           High    0    
5   5           A           B           NA      0      

I think, above mostly gives you what you need, but it might require some tuning that I hope you will be able to do

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

Basically I tried to solve the above problem by using temporary tables.

1.(src) We are assigning a score against each level that will help in fetching the minimum level available in table1 in case of absence of same level. ALso we are using full outer join which will help in findind the products missing from table1 2. (final) This table will give us all the products where product as well as their level match between the 2 tables. 3. (min_score) It will give minimum score available for every set of product_1 and product_2

with src as
(
 select t1.product1 as t1p1, t1.product_2 as t1p2, t2.product_1 as t2p1, t2.product_2 as t2p2, t1.level as t1_level, t2.level as t2_level,case when t1.level='Low' then 1 
  when t1.level='Med' then 2
  when t1.level='High' then 3
  end as level_score from table1 t1 full outer join table2 t2 on(t1.product_1=t2.product_1 and t1.product_2=t2.product_2)
),
final as
(select t1p1,t1p2,t2p1,t2p2,case when t1_level=t2_level then t1_level else
 '#NA#' as level from src join table1 on(product_1=t1p1 and product2=t1p2) where t1p1 is not null and t1p2 is not null
),
min_score as
(
select t1p1,t1p2,min(level_score) as level_score from src group by t1p1,t1p2 
)
SELECT t2p1,t2p2,t2_level,0 as value FROM src WHERE (t1p1 IS NULL AND t1p2 IS NULL) OR (t2_level='NA')
UNION ALL
SELECT t1p1,t1p2,t1_level,value from final f join table1 tab1 on ( t1p1=product_1 and t1p2=product2 and t1_level=level) where level !='#NA#'
UNION ALL
SELECT t1p1,t2p2, CASE WHEN level_score=1 then 'Low' WHEN level_score=2 THEN 'Med' ELSE 'High' END as final_level,value   from final f join table1 on(product_1=t1p1 and product2=t1p2) where f.level='#NA#' and table1.level= final_level
 ;

I had used UNION ALL since it will include every row( duplicates as well if any). The output of 3 select queries will be-

  1. The first select query will give all the products with value=0 i.e. either product missing from table1 or it's level=NA in table2.(Scenario4, Scenario5)
  2. The second query will give all the products along with their value where products as well as their level exists in both tables (Scenario3)
  3. It will give me the value where we have products in both the tables but their level don't match. For this, I found out the minimum level present in table1 for that combination of product_1 and product_2 and then got the value by comparing levels.
Prakhar Gupta
  • 471
  • 5
  • 11