55

How can I maintain a log of the data in my DB?

I have to maintain a log of every change made to each row. That means that I can't allow DELETE and UPDATE to be performed.

How can I keep such a log?

Esteban Küber
  • 36,388
  • 15
  • 79
  • 97
Greens
  • 3,061
  • 11
  • 43
  • 61

5 Answers5

34

Use "Insert Only Databases"

The basic idea is that you never update or delete data.

Each table has 2 datetime columns from and to.

They start with the value null in each (beginning of time to end of time)

When you need to "change" the row you add a new row, at the same time you update the to in the previous row to Now and the from in the row you are adding to Now.

You read data out of the table via a view that has a where to = null in it.

This method also gives you a picture of the state of your database at any point in time.

EDIT

Just to clarify in response to the comment: The sequence would be given by the primary key of the table, which would be an autoincrement number.

Shiraz Bhaiji
  • 64,065
  • 34
  • 143
  • 252
  • 2
    isn't this just a huge waste of space? – Kolten Jun 26 '09 at 22:13
  • 24
    If you need data, which you do if you need an audit trail, it's not wasted. – nos Jun 26 '09 at 22:17
  • 6
    I would not go with date time columns for any ID/Search data or any sequence data. The systems date could change for any reason or multiple operations happening within the same time resolution of a single date/time value are both possible. If you need to capture sequence in time use timestamp (guaranteed unique) or use long integers or use guids since your from too builds a chain (but you lose time as an information item - also read answer below...) – cmdematos.com Aug 09 '09 at 04:24
  • 6
    Kolten, if you worked in an industry where everything had to be tracked due to regulatory compliance you'd realize that this is a necessary evil. Also, the poster can audit only the subset of tables that most interests him/her so space wastage is relative to the level of logging he/she wants... – Jason Irwin Aug 09 '09 at 12:46
  • 2
    I've used this pattern before for particular tables. I would recommend having the a unique index made up of the foreign key(s) and the 'to' value. This means that you cannot insert another row before you update the to field of the previous current record. – Mike737 Mar 05 '10 at 08:11
  • 1
    ... *INSERT* only database... when you need to change ... *UPDATE* ... Not exact *insert only*, is it? – Jesse Feb 28 '14 at 10:51
  • 1
    @Jesse I think it's INSERT-only with respect to the actual data. The "to" and "from" columns are more like meta-columns in this case. Also remember that the "to" column is NULL until UPDATEd and then is never changed, so it too is in a sense INSERT-only. Once a real value is inserted into it, it isn't updated. – Chris Middleton Oct 15 '14 at 03:42
  • 4
    Do not store both `to` and `from`, it introduces an update anomaly - it is incorrect design. You only need to store one value. – Eric Apr 27 '16 at 20:47
  • @Eric, perhaps this anomaly doesn't really occur in practice, since you never actually update the timestamps upon setting them. Is this correct? – turdus-merula Sep 10 '17 at 19:24
  • 2
    How do you handle referential integrity with this design? As the primary key column will no longer be a static id for that instance of a row object, there will be now be many rows (for each version or change) for a given object (if u see what I mean) – Dan Jul 21 '20 at 10:48
  • @Dan did you find solution for your question. I'm also looking for ans for that. – Vishal Patel May 21 '21 at 20:00
  • @VishalPatel this answer is from 2009. The way to solve it today would be to use temporal tables. SQL Server introduced this in 2016 https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15 – Shiraz Bhaiji May 24 '21 at 14:33
  • @ShirazBhaiji yes I saw temporal table. However, I solved it in another way. Thanks – Vishal Patel May 24 '21 at 14:40
19

Use an "insert only" database, as described by Shiraz Bhaji, but you can use a simpler technique. For each table that you need to maintain audit data for, just have an additional column for Updated Time, defaulting to now. When you make a change to a record, instead of updating, just do an insert with all your data; the UpdatedTime column will get the current time.

Note that this method means you have to break or reconsider your UNIQUE constraints; you can keep a primary key, but the uniqueness becomes a composite of your primary key and your UpdatedTime.

This technique has the advantage of giving you a known range of historical data for each record on the table (each record is valid for a given time if it is the TOP 1 of records WHERE TimeOfInterest > UpdatedTime ORDER BY UpdatedTime DESC) with a low overhead (just a single column on the table). It's also quite amenable to conversion from tables not using this method, with a simple ALTER TABLE to add a single column (which you can name consistently). Then you just need to alter your UNIQUE constraints to use a composite of their current contraints and the UpdatedTime column, and some queries will need to be altered.

Note as well that you can actually avoid converting all of your queries if you create a view of the table that simply returns the most recent entry for each of the records; you end up with a table which maintains historical data transparently, and a view which looks like a regular table without the changelogging.

Paul Sonier
  • 38,903
  • 3
  • 77
  • 117
  • 1
    I really like this idea and was something I was trying out but how do you deal with referential integrity? – Dan Jul 21 '20 at 10:50
  • How do you handle referential integrity.? Suppose I want user table insert-only and I also have user_account(with user_id column) table. Which id of user table should I put in user_id column? – Vishal Patel May 21 '21 at 20:05
  • This is also the approach favored by Tomm Carr who has written a nice summary PDF which covers many subtleties, see [this answer](https://dba.stackexchange.com/a/114738/215082). The most obvious improvement over Shiraz Bhaji's answer is that the design avoid insert anomalies stemming from inconsistencies between to `from` and `to` fields. – bluenote10 Jun 20 '21 at 07:35
17

[Late post but it adds two techniques not already mentioned here]

Reading transaction log – if your database is in full recovery mode then transaction log stores a lot of useful information that can be used to see history of each row. Downside is that this is not supported by default. You can try using undocumented functions DBCC LOG or fn_dblog or third party tool such as ApexSQL Log

Using Change Data Capture - Change data capture essentially does the same thing like shown above but it’s more streamlined and a bit easier to use. Unfortunately this is only available in enterprise edition.

Both of these can solve the problem of allowing updating and deleting because you can’t really change what’s written in transaction log.

Herbert Lynch
  • 751
  • 6
  • 4
6

A totally different approach is to only have an audit log. You then use this to build the most current version of your data. You create "checkpoints" periodically or using caching to speed this up.

There is a presentation about somebody using this technique: http://www.infoq.com/presentations/greg-young-unshackle-qcon08. The big advantage here is that since you only have the audit log you'll be quite confident that your audit trail is correct.

I've never tried this and it seems pretty complicated ... but something to think about.

Dave
  • 13,518
  • 7
  • 42
  • 51
2

See if my answer to another database logging question contains the information you need. Find it here...

History tables pros, cons and gotchas - using triggers, sproc or at application level

Community
  • 1
  • 1
cmdematos.com
  • 1,976
  • 1
  • 11
  • 9