2

I need to build a type 2 dimension table to store price changes for various products. In the source data, there are two tables I can pull from. One has the current price of each product, and one has the price change history for each product. Prices for some products change more than others, and if a product's price has never changed, it won't have a record in the price change table at all.

Given the following current price table:

PRODUCT_ID  CURRENT_PRICE
----------  -------------
  ABC123        250
  DEF456        200
  GHI789        325

And product price history table:

PRODUCT_ID  OLD_PRICE   NEW_PRICE   CHANGE_DATE
----------  ---------   ---------   -----------
  ABC123        275       250        1/1/2016
  DEF456        250       225        6/1/2015
  DEF456        225       200        1/1/2016

What SQL could I run to populate a type 2 dimension as follows:

PRODUCT_ID  PRODUCT_PRICE   VALID_FROM  VALID_TO    CURRENT_PRICE_INDICATOR
----------  -------------   ----------  --------    ----------------------
  ABC123        275          1/1/1900   12/31/2015      N       
  ABC123        250          1/1/2016   12/31/9999      Y
  DEF456        250          1/1/1900   5/31/2015       N
  DEF456        225          6/1/2015   12/31/2015      N
  DEF456        200          1/1/2016   12/31/9999      Y
  GHI789        325          1/1/1900   12/31/9999      Y
Shnugo
  • 66,100
  • 9
  • 53
  • 114
EvanMPW
  • 351
  • 1
  • 6
  • 15
  • Here's a start: `SELECT PRODUCT_ID, CURRENT_PRICE, CONVERT(DATE,'99991231',120) AS VALID_TO FROM PRICE_CURRENT UNION ALL SELECT PRODUCT_ID, OLD_PRICE, VALID_TO FROM PRICE_HISTORY` if you're interested we can develop this further, – Nick.Mc Feb 02 '16 at 22:45
  • 2
    Thx for sample data and expected output, worth an upvote, but: Please avoid culture specific date-formats. Your `6/1/2015` was taken as 6th of January in my system, which produced some strange results :-) – Shnugo Feb 03 '16 at 12:54
  • 1
    I just saw that you're around... I did not ask you, which version of SQL Server you are using? I my answer you'll find `LAG()` which won't work befor SQL Server 2012... But that's quite easy to replace... Let me know if you need help. – Shnugo Feb 03 '16 at 14:34
  • This is actually going against SQL Server 2008, so LAG will not work – EvanMPW Feb 04 '16 at 15:45
  • @EvanMPW, please add a `@Shnugo` to your replies, otherwise there will be no alert... Found this per incidence... I'll edit my answer soon. – Shnugo Feb 05 '16 at 11:02
  • @EvanMPW did my answer help you out? please mark as answer if it did. –  Feb 08 '16 at 21:29
  • @EvanMPW, Of course it's on you to decide, whether any given answer is good or bad and if your problem is solved or not. The reason for my pushing is the down-vote on my answer, which seems to me quite unfair. If you don't mind I'd ask you to have a look onto the given answers and place your comment/vote/acceptance. Thx in advance and Happy Coding! – Shnugo Feb 11 '16 at 12:51

2 Answers2

1

Your end-state is a typical type 2 Slowly Changing Dimension. The Price history table is a little bit of a red herring here in that the NEW_PRICE should be ignored. Simply write that data in an initial load into your dimension table with something like:

CREATE TABLE Dim_Price (
  Price_Key INT IDENTITY,
  Product_ID NVARCHAR(10) NOT NULL,
  Price INT NOT NULL,
  Row_Effective_Date DATETIME NOT NULL,
  Row_Expiry_Date DATETIME NOT NULL,
  Row_Current_Flag INT NOT NULL)

 INSERT INTO Dim_Price VALUES
 ('ABC123',275,'1 Jan 1900','31 Dec 9999',1),
 ('DEF456',250,'1 Jan 1900','31 Dec 9999',1)

From then on you can use something akin to the merge statement below (Can't validate syntax at this moment) to merge from the source table into the destination table.

More information on slowly changing dimensions and how to handle them can be found on the Kimball Group website. After all, Ralph Kimball did invent them :)

INSERT INTO Dim_Price
SELECT
  Product_ID
 ,Price
 ,Row_Effective_Date
 ,Row_Expiry_Date
 ,Row_Current_Flag
FROM (
MERGE Dim_Price TGT
USING STG_Price SRC ON SRC.Product_ID = TGT.Product_ID
WHEN NOT MATCHED THEN 
INSERT VALUES(
  SRC.Product_ID
 ,SRC.Price
 ,'1 Jan 1900'
 ,'31 Dec 9999'
 ,1)
 WHEN MATCHED AND TGT.Row_Current_Flag = 1 AND EXITS(
   SELECT SRC.Price
   EXCEPT
   SELECT TGT.Price)
 THEN UPDATE SET TGT.Row_Current_Flag = 0
                ,TGT.Row_Expiry_Date = DATEADD(SECOND,86399,DATEADD(DAY,-1,SECOND,CAST(GETDATE() AS DATE)))
 OUTPUT $action AS Action
        ,SRC.Product_ID
        ,SRC.Price
        ,GETDATE()
        ,'31 Dec 9999'
        ,1
) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE';
  • I know it's old answer, but still this is better answer than the one accepted above. – Avi Jan 31 '19 at 13:21
1

I think it is something like this:

DECLARE @price TABLE(PRODUCT_ID VARCHAR(100),CURRENT_PRICE DECIMAL(8,4));
INSERT INTO @price VALUES
 ('ABC123',250)
,('DEF456',200)
,('GHI789',325);

DECLARE @priceHist TABLE(PRODUCT_ID VARCHAR(100),OLD_PRICE DECIMAL(8,4),NEW_PRICE DECIMAL(8,4),CHANGE_DATE DATE);
INSERT INTO @priceHist VALUES
 ('ABC123',275,250,{d'2016-01-01'})
,('DEF456',250,225,{d'2015-06-01'})
,('DEF456',225,200,{d'2016-01-01'});

WITH AllData AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Combined.PRODUCT_ID ORDER BY ISNULL(Combined.CHANGE_DATE,{d'9999-12-31'}) ASC) AS Inx
          ,*
          ,CASE WHEN CHANGE_DATE IS NULL THEN 'Y' ELSE 'N' END AS CURRENT_PRICE_INDICATOR
    FROM
    (
        SELECT p.PRODUCT_ID AS PRODUCT_ID
              ,p.CURRENT_PRICE AS PRODUCT_PRICE
              ,NULL AS CHANGE_DATE
        FROM @price AS p
        UNION ALL
        SELECT ph.PRODUCT_ID
              ,ph.OLD_PRICE
              ,ph.CHANGE_DATE
        FROM @priceHist AS ph
    ) AS Combined
)
SELECT ad.PRODUCT_ID
      ,ad.PRODUCT_PRICE
      --Version with LAG (SQL Server 2012 and higher)
      --,CASE WHEN ad.Inx=1 THEN {d'1900-01-01'} ELSE LAG(ad.CHANGE_DATE,1) OVER(PARTITION BY ad.PRODUCT_ID ORDER BY ISNULL(ad.CHANGE_DATE,{d'9999-12-31'}) ASC) END AS VALID_FROM
      ,CASE WHEN ad.Inx=1 THEN {d'1900-01-01'} ELSE LAG_Replace_For_SQLServer2008.CHANGE_DATE END AS VALID_FROM
      ,CASE WHEN ad.CURRENT_PRICE_INDICATOR='Y' THEN {d'9999-12-31'} ELSE DATEADD(DAY,-1,ad.CHANGE_DATE) END AS VALID_TO
      ,ad.CURRENT_PRICE_INDICATOR
FROM AllData AS ad
OUTER APPLY
(
    SELECT x.CHANGE_DATE 
    FROM AllData AS x
    WHERE x.PRODUCT_ID=ad.PRODUCT_ID
      AND x.Inx=ad.Inx-1
) LAG_Replace_For_SQLServer2008

The result:

ABC123  275.0000    1900-01-01  2015-12-31  N
ABC123  250.0000    2016-01-01  9999-12-31  Y
DEF456  250.0000    1900-01-01  2015-05-31  N
DEF456  225.0000    2015-06-01  2015-12-31  N
DEF456  200.0000    2016-01-01  9999-12-31  Y
GHI789  325.0000    1900-01-01  9999-12-31  Y
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • 1
    I'm sorry but this is a very bad answer. On a table of decent quantity this is horribly inefficient. Please see my answer for how to do a Type 2 SCD merge correctly. –  Feb 08 '16 at 21:29
  • @MichaelBetterton What you are doing is really rude! The OP **has a given table structure** and needs a solution for this. It is OK to suggest a better design but the OP is not obliged to rebuild his application just because of you telling him, that his structure is bad. My solution will deliver the needed result. Your solution will not even work for him! You call that a **very bad answer**? I will **not** down vote yours, as it is a valuable input, but your down vote on mine is - uhm - together with your *please accept comment* - uhm - weak... You might think about this again... – Shnugo Feb 08 '16 at 21:44
  • @MichaelBetterton One more point: [Quite the same happened to me](http://stackoverflow.com/q/35079603/5089204) I'm one of the up-voters on the accepted answer there. My suggestion with a better desing did not get one single up vote. Sometimes people have to / want to stick to a given structure... – Shnugo Feb 08 '16 at 21:46
  • I apologise for any offence I have cause, I don't want to start a argument but your solution does not scale. My solution is removing columns from the destination table and providing best practice answer (i.e, read about Kimball SCDs). Please attempt to use your solution on a 10 million row table and you will understand why providing albeit the exact answer the questioner asked is not appropriate for beginners. –  Feb 12 '16 at 04:31
  • @MichaelBetterton Believe me: I know all that! What is your real world experience in software development? The problem are your implications... We don't know how big is the project, will it grow? how fast?, which role does the OP have (is he allowed to change structure?), how old is the project (SQL Server 2008!!!, maybe tons of side code (reports, interfaces), application code)? How many installations? in my daily life I see much poorly designed things... The solutions are - in most cases! - ugly work arounds rather than structural changes (regression tests !!!) – Shnugo Feb 12 '16 at 07:31
  • @MichaelBetterton This platform is not a place for academic discussion. The question was not: I have a solution but it is slow! Your "answer" tells the OP that his design is bad and you offer an alternative - but this **is not even an answer**. If **this** structure is running slow I'd rather think about a trigger to write the old price and change date into extra columns. This solutions would be quite as fast as yours but with almost no impact. And think about the down vote... – Shnugo Feb 12 '16 at 07:34