0

Below is a Snowflake query that is trying to get the serial number that's associated with each account name (ACCT_NAME), account ID (ACCT_ID), SKU ID (SKU_ID_p2), and offer description (OFFER_DESC). However when I bring in the serial number column from the FACT_PRODUCT_OFFER_CUST_VALUE table, it goes from 3 rows to 500+ rows--any idea why? Here is my current query:

SELECT a.NAME AS ACCT_NAME,
       a.ID AS ACCT_ID,
       p2.NAME AS SKU_ID_p2,
       dpo.OFFER_DESC
FROM ASSET ast
    JOIN DB_A.SA.ACCOUNT a
        ON a.ID = ast.ACCOUNT_ID
    JOIN DB_A.SA.PRODUCT_2 p2
        ON p2.ID = ast.PRODUCT_2_ID
    JOIN DB_A.SA.OPPORTUNITY o
        ON o.ACCOUNT_ID = ast.ACCOUNT_ID
    JOIN DB_B.SB.DIM_PRODUCT_OFFER dpo
        ON dpo.PRODUCT_ID = p2.ID
    JOIN DB_B.SB.DIM_PRODUCT_OFFER_CUST_VALUE dpocv
        ON dpocv.OFFER_SKU_ID = dpo.OFFER_SKU_ID
    JOIN DB_B.SB.FACT_PRODUCT_OFFER_CUST_VALUE fpocv
        ON fpocv.SERIAL_NUM = dpocv.SERIAL_NUM
GROUP BY a.NAME,
         a.ID,
         p2.NAME,
         dpo.OFFER_DESC

And here's a sample result of this query:

ACCT_NAME ACCT_ID SKU_ID_P2 OFFER_DESC
Joe's Hardware Shop 9084785487488HUY JH-WEQ-3S-FER-KIR Plywood: 500M (Config B)
Joe's Hardware Shop 9084785487488HUY JH-WEQ-230j-FER-KIR Plywood: 1000M (Config D)
Joe's Hardware Shop 9084785487488HUY JH-ITK-OPKF-LFPL-KIR1 Plywood: 250M (Config A)

However when I bring in the Serial Number column from the FACT_PRODUCT_OFFER_CUST_VALUE table, the original query jumps from 3 rows to 500+ rows. The correct output should have a unique serial number for the three rows in the table above. Correct output below:

ACCT_NAME ACCT_ID SKU_ID_P2 OFFER_DESC SERIAL_NUM
Joe's Hardware Shop 9084785487488HUY JH-WEQ-3S-FER-KIR Washer: Model A (Config B) AUEJKLOR
Joe's Hardware Shop 9084785487488HUY JH-WEQ-230j-FER-KIR Washer: Model B (Config D) AGITJRD
Joe's Hardware Shop 9084785487488HUY JH-ITK-OPKF-LFPL-KIR1 Washer: Model E (Config A) AREJKEK

Updated query with SERIAL_NUM brought in as a column:

SELECT a.NAME AS ACCT_NAME,
       a.ID AS ACCT_ID,
       p2.NAME AS SKU_ID_p2,
       dpo.OFFER_DESC,
       fpocv.SERIAL_NUM
FROM ASSET ast
    JOIN DB_A.SA.ACCOUNT a
        ON a.ID = ast.ACCOUNT_ID
    JOIN DB_A.SA.PRODUCT_2 p2
        ON p2.ID = ast.PRODUCT_2_ID
    JOIN DB_A.SA.OPPORTUNITY o
        ON o.ACCOUNT_ID = ast.ACCOUNT_ID
    JOIN DB_B.SB.DIM_PRODUCT_OFFER dpo
        ON dpo.PRODUCT_ID = p2.ID
    JOIN DB_B.SB.DIM_PRODUCT_OFFER_CUST_VALUE dpocv
        ON dpocv.OFFER_SKU_ID = dpo.OFFER_SKU_ID
    JOIN DB_B.SB.FACT_PRODUCT_OFFER_CUST_VALUE fpocv
        ON fpocv.SERIAL_NUM = dpocv.SERIAL_NUM
GROUP BY a.NAME,
         a.ID,
         p2.NAME,
         dpo.OFFER_DESC,
         fpocv.SERIAL_NUM
  • 1
    Please in code questions give a [mre]--cut & paste & runnable code & example input; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy May 14 '22 at 20:59
  • [Two SQL LEFT JOINS produce incorrect result](https://stackoverflow.com/q/12464037/3404097) – philipxy May 14 '22 at 21:00
  • 1
    Can you run select "SERIAL_NUM, count(*) FROM FACT_PRODUCT_OFFER_CUST_VALUE WHERE SERIAL_NUM in ('AUEJKLOR', 'AGITJRD' , 'AREJKEK') GROUP BY SERIAL_NUM" and let us know the results. My guess is that your fact table has these serial number more than once. Please confirm. – P Needleman May 14 '22 at 21:29

1 Answers1

1

The simple answer is: You have many rows in the new table with the same value as you are join on. This is the ONLY reason you ever get more rows.

As to the next question why do you have many value that are the same, that is harder to know. But given you have used the word FACT it's making me think of a SCD2 like table, and thus while there is only one Plywood: 500M (Config B) over time it has had different prices. And when you join to a FACT/DIMENSION table like price, it is also had to be with respect to time.

Which implies you ether want the latest price, OR you want the price as the time of the event to a looking at.

Normally a DIMENSION table will have start_time, end_time columns and often I have seen then with end_time being null if it is the current "price", so you will need more clauses on your JOIN ON to bind to the row you want.

Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45
  • Thanks, Simeon. I didn't know about the FACT/DIMENSION table and the dates. I do see that there is an 'end_time' column which means that the most recent 'end_time' is the most recent price--correct? So I should be using the price with the latest 'end_time' as the most recent record? – doubledribble May 15 '22 at 00:10
  • It's not a hard and fast thing, sometimes tables have two+ sets of FK to join on depending if you are want SCD1 (latest) or SCD2 (value at time).. but it's definitely something to look for, to help explain why you are getting more rows that you expect. – Simeon Pilgrim May 15 '22 at 00:49