Recommended way to keep audit history information for a very large table

Question

For a new project we're greenfielding using Spring, Java 8, and SQL Server 2012 we will likely have a very large (in the sense of wide, about 150 columns) table, to store contract information. ONe of the goals of the project will be to keep some audit information regarding the contract information. This historical information should also be available in the application itself, so older versions of the contract can be viewed.

If this was a smaller table (and cutting the table up into smaller pieces is something that can definitely be considered) I'd simply make a new entry in the table for the contract, or perhaps have a separate table for the historical information.

This does not seem that optimal when it comes to disk usage, however. Despite the relatively small amount of contracts we'll have (<100k), the amount of history might definitely grow in the future, depending on how the people work.

I know an alternative could be to simply store things in a key/value approach to keep the different deltas, but it somehow seems like a pain to reconstruct the contract in some event sourcing kind of way, just to show historical information.

Am I missing any good alternatives?

score 0 · Answer 1 · answered Apr 06 '17 at 07:19

0

If you are using SQL Server 2016, you might consider using a Temporal Table. This will allow you to query the data as at any point in time.

answered Apr 06 '17 at 07:19

Jamio

11
1

score 0 · Answer 2 · edited May 23 '17 at 12:32

The SQL Server Temporal table is a primitive design that will hide a lot of system effort could present unsolvable performance issues. All that and it only presents a mono-temporal access (transaction time) to the data. An answer to a similar question is here which discusses Version Normal Form. In this particular answer, I also presented only a transaction time design, but there are links to further details of the design which present a slight modification to bring it up to full bi-temporal access (valid/effective time as well as transaction time).

Briefly, effective time is when the data became effective, not when the data was written to the database. For example, a price changed from $14 to $32 as of 1 Jan. However, the database wasn't updated until 7 Jan. If you access the data using transaction time for the price as of 3 Jan, you will get back $14, the old price. That is what the database would have reported if you had executed a query on that date. But if you access using effective time, you will get back $32 because that was the price that was in effect at that time, even though the database was not made aware of that until afterward. There are requirements for both types of access.

One big advantage to my method (other than full bi-temporal access) is that you determine which fields of the contract entity to "version" (track all changes to). Also, since you control all aspects of the design, you can better handle performance related issues should they appear. Not to mention you don't need to learn a new form of SQL.

Ok, there are more advantages:

Current and history data reside in the same tables. You don't need to determine beforehand if the data is current or historic to determine the data source.
In fact, the same query is used to retrieve current data, transaction-time data or effective-time data.
By avoiding Row Spanning Dependencies, performance is impressive.

Interesting. Isn't this pretty close to what Data Vault structure does with the satellite tables? — Juha K, Apr 07 '17 at 06:51
There is an outward similarity. But consider that Data Vault is targeted for data warehouses with multiple sources of its data. As such, there is no provision for data integrity and the structure makes it difficult to query -- this is stated in the DV literature, not my opinion. The structure I designed is a variant of 2nf, thus maintains all data integrity features, is a format that database developers know, and is targeted for OLTP databases. It could, I suppose, be used to implement a slowly changing dimension structure for a warehouse, but I have not studied that type of use. — TommCatt, Apr 08 '17 at 08:31
Cool, thanks for the insight! Seems sensible to me (I'm more of a data warehouse guy myself). — Juha K, Apr 08 '17 at 09:56

Recommended way to keep audit history information for a very large table

2 Answers2