2

I am trying to load a standard Kimball SCD2 Dimension, using a merge statement which I got from the following website: http://www.kimballgroup.com/2008/11/design-tip-107-using-the-sql-merge-statement-for-slowly-changing-dimension-processing/

This merge statement is the same except to handle new entities. This will be handled as a direct insert in the dataflow. This problem concerns only multiple versions of the same business key.

When I execute the merge statement SQL returns the error:

Msg 8672, Level 16, State 1, Line 3
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row.
A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

I am using SQL Server 2012:

SOURCE DATASET

enter image description here

TARGET DATASET

enter image description here

This is what I expected:

enter image description here

Below you can find the script to reproduce the problem:

CREATE TABLE SANDBOX.EHN.SOURCE_SCD2 (
  BUSINESS_KEY  BIGINT
 ,DESCRIPTION_A VARCHAR(2)
 ,M_CRC         BIGINT
 ,StartDATE     DATE
 ,EndDATE       DATE )



CREATE TABLE SANDBOX.EHN.TARGET_SCD2 (
  BUSINESS_KEY  BIGINT
 ,DESCRIPTION_A VARCHAR(2)
 ,M_CRC         BIGINT
 ,StartDATE     DATE
 ,EndDATE       DATE )



 select *
 from SANDBOX.EHN.TARGET_SCD2

 truncate table SANDBOX.EHN.TARGET_SCD2

 INSERT INTO SANDBOX.EHN.SOURCE_SCD2 VALUES (1, 'B',  1,   '2015-05-16', '2015-06-01')
 INSERT INTO SANDBOX.EHN.SOURCE_SCD2 VALUES (1, 'C',  2,   '2015-06-01', '2015-06-11')
 INSERT INTO SANDBOX.EHN.SOURCE_SCD2 VALUES (1, 'D',  3,   '2015-06-11', '9999-12-31')

 INSERT INTO SANDBOX.EHN.TARGET_SCD2 VALUES (1,  'A', 0,    '2015-01-16', '9999-12-31')



INSERT INTO SANDBOX.EHN.TARGET_SCD2 
     SELECT BUSINESS_KEY
          ,DESCRIPTION_A
         ,M_CRC
         ,StartDATE
         ,EndDATE
FROM (
MERGE SANDBOX.EHN.TARGET_SCD2 D
USING SANDBOX.EHN.SOURCE_SCD2 UPD
ON(D.BUSINESS_KEY = UPD.BUSINESS_KEY )
    WHEN MATCHED AND D.EndDATE = '9999-12-31' 
    THEN UPDATE SET  D.EndDATE =  UPD.EndDATE
OUTPUT $Action Action_Out, UPD.BUSINESS_KEY
                         , UPD.DESCRIPTION_A
                    , UPD.M_CRC
                    , UPD.StartDATE
                    , UPD.EndDATE
)AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE'

Can you help me to fix this problem?

TT.
  • 15,774
  • 6
  • 47
  • 88
Erik hoeven
  • 1,442
  • 6
  • 26
  • 41
  • Well, the error message is rather clear: you have a target row that the merge statement tried to update or delete more than once. That is not allowed for a merge statement. Rewrite your query to make sure that a row can only be the target of one update or delete statement. – TT. Nov 21 '16 at 09:25
  • Thank you for the reply but that is exactly the problem, we know 'what' we need to do but don't know the 'how'? – Erik hoeven Nov 21 '16 at 09:54
  • I'm a bit puzzled... AFAICT the MERGE statement is a top-level statement and cannot be used inside a derived table like you are doing. At least I'm not seeing any such possible use in the [documentation](https://msdn.microsoft.com/en-us/library/bb510625.aspx). I executed your script on a test database (SS 2012) and it gives an error message that I think is due to that fact (*Incorrect syntax near 'D'* at the start of the derived table containing the merge). Am I missing something? – TT. Nov 21 '16 at 10:16
  • `MERGE` is overrated for SCD's. I always use seperate INSERT/UPDATE in a transaction. There's more code and it's repeated but it's easier to debug. The curious thing about your SCD update is that the _source_ is also date delimited. Not only that you have three arriving at once (rather than one which is the usual case). So your case is not common but is achievable. It's not covered in that any of the vanilla SCD suggestions online. Anyway I can see the SQL that would be required and can post if you like or I can explain in more detail – Nick.Mc Nov 21 '16 at 10:25
  • Nick you are right that it is not common that you multiple updates of the same businesskey. Whe load once in a day. Because the hardware where the dwh is rely on. It happens that we mis a day. Lucky we have a data lake where there is a persistent staging area. Which has every of the business key. I am still curious about your solution for this problem. – Erik hoeven Nov 21 '16 at 10:41
  • 1
    Sorry I don't have time to pose a more detailed response for your curiosity. But basically you can replace merge with seperate update/insert statements inside a transaction. Here's an article listing some problems with merge, although I don't know how many are still valid: https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ – Nick.Mc Nov 23 '16 at 03:54
  • I am also with @Nick.McDermaid here, had performance problems in the past (SQL server 2008R2 / 2012) using Merge and switched back to using the trusted update / insert technique that is much better tune-able performance wise IMO. I found the same article as Nick, the status of almost all bugs there is still the same and if they are changed, it now says: Won't Fix... – Rogier Werschkull Nov 23 '16 at 07:52
  • Thanks for all your comments. I fully agree with your comments. – Erik hoeven Nov 23 '16 at 08:03

1 Answers1

2

For the last update only use;

INSERT INTO SANDBOX.EHN.TARGET_SCD2 
     SELECT BUSINESS_KEY
          ,DESCRIPTION_A
         ,M_CRC
         ,StartDATE
         ,EndDATE
FROM (
MERGE SANDBOX.EHN.TARGET_SCD2 D
USING SANDBOX.EHN.SOURCE_SCD2 UPD
ON(D.BUSINESS_KEY = UPD.BUSINESS_KEY AND UPD.EndDATE = '9999-12-31')
    WHEN MATCHED AND D.EndDATE = '9999-12-31' 
    THEN UPDATE SET  D.EndDATE =  UPD.StartDATE
OUTPUT $Action Action_Out, UPD.BUSINESS_KEY
                         , UPD.DESCRIPTION_A
                         , UPD.M_CRC
                         , UPD.StartDATE
                         , UPD.EndDATE
)AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE'

If you want ALL SRC rows in your target table then I agree with Nick.McDermaid

For ALL rows use;

UPDATE TRG
  SET TRG.EndDate = SRC.StartDATE
FROM SANDBOX.EHN.TARGET_SCD2 TRG
JOIN  ( select SRC.BUSINESS_KEY, min(src.StartDATE)StartDATE
        from SANDBOX.EHN.SOURCE_SCD2 SRC
          group by SRC.BUSINESS_KEY
       )SRC
on  ( TRG.BUSINESS_KEY = SRC.BUSINESS_KEY
       AND SRC.StartDate > TRG.StartDate ) 
where 1 = 1


INSERT SANDBOX.EHN.TARGET_SCD2
SELECT * FROM SANDBOX.EHN.SOURCE_SCD2