0

A follow up question to SQL Server Merge: update only changed data, tracking changes?

we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.

So, the two questions we have are: is it possible to combine this very simple update into a single statement?

UPDATE        tbladsgroups
SET           tbladsgroups.Description = s.Description,
              tbladsgroups.action='Updated'
FROM          tbladsgroups t
INNER JOIN    tbladsgroups_staging s
ON            t.SID = s.SID
Where   s.Description   <> t.Description 

UPDATE        tbladsgroups
SET           tbladsgroups.DisplayName = s.DisplayName,
              tbladsgroups.action='Updated'
FROM          tbladsgroups t
INNER JOIN    tbladsgroups_staging s
ON            t.SID = s.SID
Where   s.DisplayName   <> t.DisplayName

....for each column.

Second question.

Can we record into a separate table/variable which record has been updated?

Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.

edit complete merge:

M

ERGE tblADSGroups AS TARGET
    USING tblADSGroups_STAGING AS SOURCE
        ON (TARGET.[SID] = SOURCE.[SID])
    WHEN MATCHED 
    THEN UPDATE SET
        
        TARGET.[Description]=CASE
                            WHEN source.[Description] != target.[Description]  THEN(source.[Description] 
                            )
                            ELSE target.[Description] END,
        TARGET.[displayname] = CASE 
                            WHEN source.[displayname] != target.[displayname]  THEN source.[displayname]
                            ELSE target.[displayname] END
        
...other columns cut for brevity


    WHEN NOT MATCHED BY TARGET
        THEN
            INSERT (
                [SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
                )
            VALUES (
                source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
                )
 
  
  WHEN NOT MATCHED BY SOURCE
        THEN
            UPDATE SET ACTION='Deleted'
  • Why can't you just do a preselection of what to update in the source of the `MERGE`, then the output won't include rows that weren't updated. – Mark Rotteveel Oct 02 '22 at 11:03
  • Please show your full `MERGE` statement – Charlieface Oct 02 '22 at 12:12
  • "When simply updating one table based on the rows of another table, improve the performance and scalability with basic INSERT, UPDATE, and DELETE statements." see: [MERGE (Transact-SQL)](https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver16#:~:text=When%20simply%20updating%20one%20table%20based%20on%20the%20rows%20of%20another%20table%2C%20improve%20the%20performance%20and%20scalability%20with%20basic%20INSERT%2C%20UPDATE%2C%20and%20DELETE%20statements.) – Luuk Oct 02 '22 at 12:33
  • @Luuk thanks, KISS is often the best approach – user18126559 Oct 02 '22 at 12:55

3 Answers3

0

We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.

What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/

Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.

To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.

Our code ends up looking like the following:

  • INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
  • Compare stage vs dimension using the INTERSECT and store all matches in a table variable
  • Using the table variable, "close" all matching rows in the Dimension
  • Using the table variable, INSERT the new rows
  • If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.
SchmitzIT
  • 9,227
  • 9
  • 65
  • 92
  • thanks, we also have a solution using intersect but as our SQL guru is away for a few weeks we are sort of guessing at the best solution, what we dont have is the "close" and insert of new rows, but I'm guessing this should be straight foward. – user18126559 Oct 02 '22 at 12:20
0

You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.

For example

UPDATE t
SET Description = s.Description,
    DisplayName = s.DisplayName,
    action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO @tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
  ON t.SID = s.SID
  AND NOT EXISTS (
    SELECT s.Description, s.DisplayName
    INTERSECT
    SELECT t.Description, t.DisplayName
  );

You can do a similar thing with MERGE, if you also want to INSERT

MERGE tbladsgroups t
USING tbladsgroups_staging s
  ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS (    -- do NOT place this condition in the ON
    SELECT s.Description, s.DisplayName
    INTERSECT
    SELECT t.Description, t.DisplayName
  )
  THEN UPDATE SET
    Description = s.Description,
    DisplayName = s.DisplayName,
    action = 'Updated'
WHEN NOT MATCHED
  THEN INSERT (ID, Description, DisplayName)
       VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO @tbl (ID, Description, DisplayName)
;
Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • thanks, yes we have tried this, but we only want to update changed values, this will update all values even if not changed and we therefore cannot track the changes. – user18126559 Oct 02 '22 at 12:53
  • No it will not, only rows with changes will be updated, due to the extra join condition. Please give a [mcve] showing how rows that do not have changes are updated. – Charlieface Oct 02 '22 at 13:27
  • ah thanks, yes I understand, so it works and updates the tables. Is there a simple way we can record what has changed? Something like, ID,OLDVal, NEWval – user18126559 Oct 02 '22 at 15:12
  • Well the `OUTPUT` clause can have `inserted.SomeColumn` and `deleted.SomeColumn` which gives the new and old values respectively. You may want to look at Change Tracking and/or Change Data Capture – Charlieface Oct 02 '22 at 16:44
  • ah, got it, many thanks for your help. seems like a pretty simple solution (which we want) and can be implemented on our hundreds of tables. – user18126559 Oct 02 '22 at 19:23
0

I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?

groups description   groups display   staging description  staging display
SomeValue            Show Me          SOME other Value     Show Me
Try This             Attempt          Try This             Working on it
Both Diff            Changes          Both Are Diff        Change Me

So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.

In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.

UPDATE        tbladsgroups
SET           tbladsgroups.Description = s.Description,
              tbladsgroups.DisplayName = s.DisplayName,
              tbladsgroups.action='Updated'
FROM          tbladsgroups t
INNER JOIN    tbladsgroups_staging s
ON            t.SID = s.SID
Where   s.Description   <> t.Description 
   OR   s.DisplayName   <> t.DisplayName

Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..

select
      t.*,
      s.Description,
      s.DisplayName
   from
      tblAdsGroups t
         JOIN tblAdsGroups_Staging s
            on t.sid = s.sid 

Then you always have the correct description and display name and dont have to keep synching updates between them.

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • yes this is true, but our ultimate goal is to be able to track changes over time, so we want to know what has changed and when and store those values, SID, Attribute, oldval, newval. If we can find a solution for this then we are happy to update all rows, but this is the most important piece for us. – user18126559 Oct 02 '22 at 15:26