1

We are trying to create a solution where we track changes in an external data source (active directory) over time. Every hour we take a snapshot of the data and then compare this to our baseline so we can see the changes, and also update the existing baseline to a new one.

Our current thoughts just use a sort of UPSERT using Merge, but we don't know how to only update what has changed

Our data has around 60 columns and 1000s of rows.

Snapshot

ID Name SN Other
1 Name1 N1 O1
2 Name2 XX O2
3 Nxx3 N3 OX
4 Name4 N4 O4

Baseline

ID Name SN Other
1 Name1 N1 O1
2 Name2 N2 O2
3 Name3 N3 O3

Results - New Baseline

ID Name SN Other
1 Name1 N1 O1
2 Name2 XX O2
3 Nxx3 N3 OX
4 Name4 N4 O4

Changed
2, SN
3, Name, Other
New
4,

We have thought about reading each line of the baseline in our code and comparing it to the snapshot and then writing in code a query to update each line, but this seems a little cumbersome and wanted to know if we could use a better method to do this.

We have experiment with this merge to update and insert the data but this will always update the rows, what we would like to add is this to the merge

If snapshot.Name <> Baseline.Name update baseline and record change else skip
If snapshot.SN <> Baseline.SN update baseline and record change else skip

etc, but we dont seem to be able to get that to work, then we dont know how to record what has changed.

MERGE BASELINE AS TARGET
USING SNAPSHOT AS SOURCE
    ON (TARGET.[id] = SOURCE.[id])

WHEN MATCHED 
    THEN
        UPDATE
        --IF columns are not the same then update, else skip this row, but how?
        --This will always update the row, and cannot use multple matches
        SET TARGET.[shortName] = SOURCE.[shortName]

WHEN NOT MATCHED BY TARGET
    THEN
        INSERT (
            [Id]
            ,[Name]
            ,[shortName]
            ,[other]
            )
        VALUES (
            SOURCE.[Id]
            ,SOURCE.[Name]
            ,SOURCE.[shortName]
            ,source.[other]
            );

any ideas or comments to even get the changed only data would be great.

  • BTW, Dont forget you also need to track when things get deleted from the source. e.g. if the entry disappears from active directory you need to remove it from the sql version. You can use "WHEN NOT MATCHED BY SOURCE" for that. – codeulike Sep 27 '22 at 10:39
  • Here's a sketch of an idea. You can use EXCEPT to find out which rows are different. See here https://stackoverflow.com/a/1473795/22194 But this wont tell you what the differences are. But you can then have a trigger on the table that detects what has changed and logs the changes to an audit trail table. There are examples of this around, e.g. see this answer https://stackoverflow.com/a/19737882/22194 which references this article by Pop Rivett https://www.red-gate.com/simple-talk/databases/sql-server/database-administration-sql-server/pop-rivetts-sql-server-faq-no-5-pop-on-the-audit-trail/ – codeulike Sep 27 '22 at 10:48
  • any other ideas? – user18126559 Oct 02 '22 at 09:00

0 Answers0