2

I have been tasked with building a history table in SQL. I have already built the base table which contains multiple left joins amongst other things. The base table will need to be compared to another table and only update specific columns that have changed, insert new rows where the key doesn't match.

Previously I have used other ETL tools which have GUI style built in SCD loaders, but I don't have such luxury in SQL Server. Here the merge statement can handle such operations. I have used the MERGE statement before, but I become a bit stuck when handling flags and date fields based on the operation performed.

Here is the BASE table

KEY CLIENT QUANTITY CONTRACT_NO FC_COUNT DELETE_FLAG RECORD_UPDATED_DATE
345 A 1000 5015 1 N 31/12/9999
346 B 2000 9352 1 N 31/12/9999
347 C 3000 6903 1 N 31/12/9999
348 D 1000 7085 1 N 31/12/9999
349 E 1000 8488 1 N 31/12/9999
350 F 500 6254 1 N 31/12/9999

Here is the table I plan to merge with

KEY CLIENT QUANTITY CONTRACT_NO FC_COUNT
345 A 1299 5015 1
346 B 2011 9352 1
351 Z 5987 5541 1

The results I'm looking for are

KEY CLIENT QUANTITY CONTRACT_NO FC_COUNT DELETE_FLAG RECORD_UPDATED_DATE
345 A 1000 5015 1 N 06/07/2022
345 A 1299 5015 1 N 31/12/9999
346 B 2000 9352 1 N 06/07/2022
346 B 2011 9352 1 N 31/12/9999
347 C 3000 6903 1 Y 06/07/2022
348 D 1000 7085 1 Y 06/07/2022
349 E 1000 8488 1 Y 06/07/2022
350 F 500 6254 1 Y 06/07/2022
351 Z 5987 5541 1 N 31/12/9999

As we can see I have shown the changes, closed off the old records, marked with a date and a delete flag if they are missing but was there previous, as well as new new row with the new key and data

Would this be a MERGE? Some direction on how to perform this sort of operation would be a great help. We have a lot of tables where we need to keep change history and this will help a lot going forward.

code shell attempt

SELECT      
            MAIN_KEY,
            CLIENT,
            QUANTITY,
            CONTRACT_NO,
            1 AS FC_COUNT,
            NULL as DELETE_FLG_DD,
            GETDATE() as RECORD_UPDATED_DATE
INTO  #G1_DELTA
FROM
            [dwh].STG_DTL
MERGE [dwh].[PRJ1_DELTA] TARGET
USING #G1_DELTA SOURCE
ON   TARGET.MAIN_KEY                    = SOURCE.MAIN_KEY
    
WHEN MATCHED THEN INSERT 
    (
        MAIN_KEY,
        CLIENT,
        QUANTITY,
        CONTRACT_NO,
        FC_COUNT,
        DELETE_FLG_DD,
        RECORD_UPDATED_DATE
    )
VALUES
    (
        SOURCE.MAIN_KEY,
        SOURCE.CLIENT,
        SOURCE.QUANTITY,
        SOURCE.CONTRACT_NO,
        SOURCE.FC_COUNT,
        SOURCE.DELETE_FLG_DD,
        SOURCE.RECORD_UPDATED_DATE
        )
teelove
  • 73
  • 5
  • Can you share your best coding attempt at this problem? – lemon Jul 05 '22 at 10:40
  • sure - i have had to provide you with dummy tables, but based on the dummy data here is the shell code i had, again, i'm struggling with how to apply the update rules and change the flags – teelove Jul 05 '22 at 11:17
  • code added above – teelove Jul 05 '22 at 11:17
  • where does row 2 inside the output table come from? No record like that is found in your input tables, how should that be crafted? Same happens with row 4. – lemon Jul 05 '22 at 11:23
  • Apologies, i will amend the tables - i was rushing to get a example together – teelove Jul 05 '22 at 12:04
  • 1
    I have amended the table, i had the wrong client number - the only change fields we are interested in is the QUANTITY – teelove Jul 05 '22 at 12:06
  • 1
    If I get it right, all the old date values from the first table get mapped to current day (or is it a specific day?) and the newly added rows will have date corresponding to `31/12/9999`, is that correct? @teelove – lemon Jul 05 '22 at 12:08
  • thats correct - if there is a change, the old rows get closed off with todays date, the changed rows (matched key) get inserted and have an open date as you said `31/12/9999`, and the deleted rows (mathced key but missing) stay in the table, delete flag as Y and a closed of date too. new rows like KEY-351 get inserted and have an open date `31/12/9999` – teelove Jul 05 '22 at 12:23
  • do you need a `SELECT` statement or are you looking for a trigger that works on your `INSERT` statements? – lemon Jul 05 '22 at 12:27
  • 1
    At this point, i'm out of ideas - i'm not sure what a trigger is – teelove Jul 05 '22 at 12:35

1 Answers1

1

If you need to build a history table containing the updated information from your two tables, you first need to select updated information from your two tables.

The changes that need to be applied to your tables are on:

  • "tab1.[DELETE_FLAG]", that should be updated to 'Y' whenever it has a match with tab2
  • "tab1.[RECORD_UPDATED_DATE]", that should be updated to the current date
  • "tab2.[DELETE_FLAG]", missing and that should be initialized to N
  • "tab2.[RECORD_UPDATED_DATE]", missing and that should be initialized to your random date 9999-12-31.

Once these changes are made, you can apply the UNION ALL to get the rows from your two tables together.

Then, in order to generate a table, you can use a cte to select the output result set and use the INTO <table> clause after a selection to generate your "history" table.

WITH cte AS (
    SELECT tab1.[KEY],
           tab1.[CLIENT],
           tab1.[QUANTITY],
           tab1.[CONTRACT_NO],
           tab1.[FC_COUNT],
           CASE WHEN tab2.[KEY] IS NOT NULL
                THEN 'N'
                ELSE 'Y'
           END       AS [DELETE_FLAG],
           CAST(GETDATE() AS DATE) AS [RECORD_UPDATED_DATE]
    FROM      tab1
    LEFT JOIN tab2
           ON tab1.[KEY] = tab2.[KEY] 

    UNION ALL 

    SELECT *, 
           'N'          AS [DELETE_FLAG],
           '9999-12-31' AS [RECORD_UPDATED_DATE]
    FROM tab2
)
SELECT *
INTO history
FROM cte
ORDER BY [KEY];

Check the demo here.

lemon
  • 14,875
  • 6
  • 18
  • 38
  • 1
    Thank you very much for your input - i will take some time to digest this - i can see from the demo the outputs look great, but i'm going to do my best to study this. thank you again! – teelove Jul 05 '22 at 12:56
  • i have just run into an issue where my key is made up of 3 fields, so i assume i can add those with AND to the CASE WHEN ad LEFT JOINS? – teelove Jul 05 '22 at 13:23
  • 1
    Inside the `CASE` statement it is sufficient for a single field key to be there, as they will be all three null. For the `LEFT JOIN`, you are forced to match all three keys, so an `AND` operator will be needed twice, yes. – lemon Jul 05 '22 at 13:25
  • is there anything i need to do settings wise, SQL is not like `cte` – teelove Jul 05 '22 at 13:26
  • If you can update your post to include the three keys, I can show directly. – lemon Jul 05 '22 at 13:27
  • I've managed to add the keys and copied the fiddle to a test db, and it looks like it will work great but our DBA says we cannot use cte, this will require some working around red tape *sad face* - looks like i will have to have a look into the merge method - i have marked as answered, but i will need to look further into using merge as i dont know how long enable (or whatever it entails) `cte` will take – teelove Jul 05 '22 at 13:57
  • If you can't use the ctes and you're forced to use the `MERGE` statement, then using the answered mark or not is up to you, as this answer doesn't satisfy the `MERGE` requirement. – lemon Jul 05 '22 at 14:05
  • 1
    thank you for you're help, much appreciated - doesnt look like i can use `cte` any time soon – teelove Jul 05 '22 at 15:23
  • great news, cte are enabled and the code works :) – teelove Jul 05 '22 at 17:13
  • 1
    Glad my answer was helpful and solved your problem! – lemon Jul 05 '22 at 17:14
  • though this fully solves my problem, there is one refinement i could ask you about, the `SELECT * INTO` in the `cte` - this only seems to work when creating a new table with the history, when i load more changes in to tab2 and re-run, it errors saying there is an object already named "history" – teelove Jul 07 '22 at 08:35
  • Use `DROP TABLE IF EXISTS history;` before you rerun this query. You can also add this command on top of the script, so that every time you run it won't prompt you. – lemon Jul 07 '22 at 09:14