I am having issues retrieving the latest value in a satellite table when some data is changed back to a former value.
The database is Snowflake.
As per Data Vault 2.0, I am currently using the hash diff function to assess whether to insert a new record in a satellite table, like this:
INSERT ALL
WHEN (SELECT COUNT(*) FROM SAT_ACCOUNT_DETAILS AD WHERE AD.MD5_HUB_ACCOUNT = MD5_Account AND AD.HASH_DIFF = AccHashDiff) = 0
THEN
INTO SAT_ACCOUNT_DETAILS (MD5_HUB_ACCOUNT
, HASH_DIFF
, ACCOUNT_CODE
, DESCRIPTION
, SOME_DETAIL
, LDTS)
VALUES (MD5_AE_Account
, AccHashDiff
, AccountCode
, Description
, SomeDetail
, LoadDTS)
SELECT DISTINCT
MD5(AccountId) As MD5_Account
, MD5(UPPER(COALESCE(TO_VARCHAR(AccountCode), '')
|| '^' || COALESCE(TO_VARCHAR(Description), '')
|| '^' || COALESCE(TO_VARCHAR(SomeDetail), '')
)) AS AccHashDiff
, AccountCode
, Description
, SomeDetail
, LoadDTS
FROM source_table;
The first time, a new record with AccountCode = '100000' and SomeDetail = 'ABC' is added:
MD5_HUB_ACCOUNT | HASH_DIFF | ACCOUNT_CODE | DESCRIPTION | SOME_DETAIL | LDTS |
---|---|---|---|---|---|
c81e72... | 8d9d43... | 100000 | An Account | ABC | 2021-04-08 10:00 |
An hour later, an update changes the value of SomeDetail to 'DEF', this is the resulting table:
MD5_HUB_ACCOUNT | HASH_DIFF | ACCOUNT_CODE | DESCRIPTION | SOME_DETAIL | LDTS |
---|---|---|---|---|---|
c81e72... | 8d9d43... | 100000 | An Account | ABC | 2021-04-08 10:00 |
c81e72... | a458b2... | 100000 | An Account | DEF | 2021-04-08 11:00 |
A third update sets the value of SomeDetail back to 'ABC', but the record is not inserted in the satellite table, because the value of the hash diff is the same as the first inserted record (i.e. 8d9d43...).
If I query which is the latest record in the satellite table, the LDTS column tells me it's the one with 'DEF' which is not the desired result. Instead, I should have a record with SomeDetail = 'ABC' and LDTS = '2021-04-08 12:00'.
What is the correct approach to this? If I add LoadDTS to the hash diff, a new record will be created each time an update is pushed, which is not the desired result either.