Database Design: how to track history?

Question

What are the general strategies in DB design to maintain a revision history? If it were just one table I was dealing with, I think it wouldn't be so hard. Just save each update as a new record in the table. The last record will always be the latest revision.

But when the data is stored across multiple tables, what's a good way to design that so that it can track revisions?

read [Audit-Trail-Tracing-Data-Changes-in-Database](http://www.codeproject.com/Articles/105768/Audit-Trail-Tracing-Data-Changes-in-Database) — code_10, Dec 02 '22 at 19:28

vearutop · Accepted Answer · 2012-07-02T15:09:45.613

3

I prefer to have additional historical table for each versioned table. Same structure as main table with time_from and time_to additional fields. Transparently filled with triggers. time_to of latest revision set to far far future.

State for specified moment can be retrieved with query like this:

SELECT * FROM user_history 
WHERE time_from >= '2012-02-01' AND time_to <= '2012-02-01'

As for me, storing history within main table is not generally a good idea, as it requires complicated conditions when retrieving or joining current data.

edited Jul 02 '12 at 15:09

answered Jul 02 '12 at 13:09

vearutop

3,924
24
41

So the main tables have the most recent data. Then the "historical" tables have copies of each revision? Does this break normalization? – StackOverflowNewbie Jul 02 '12 at 14:35
Yes, denormalization occures, it is price for simplicity (history states are auto generated from simple `INSERT`, `DELETE`, `UPDATE` on main table) and performance of latest revision (for example main table has indexes based on data while history has date indexes). If latest revision is not main revision to work with, this approach may suffer from broken normalization. – vearutop Jul 02 '12 at 14:53

score 2 · Answer 2 · edited May 23 '17 at 11:58

2

The hard part is not the versioning of the "base" tables - you just version them individually as you would a single table in isolation.

The hard part is tracking connections between them.

How exactly are you going to do that depends on the requirements of the particular project. Here is an example of how sales orders could be "historized", but there are many other variations possible.

edited May 23 '17 at 11:58

Community

1
1

answered Jul 02 '12 at 17:52

Branko Dimitrijevic

50,809
10
93
167

score 0 · Answer 3 · answered Jul 02 '12 at 13:15

0

Turn on MySQL's binary logging and just use that.

answered Jul 02 '12 at 13:15

Matt Gibson

14,616
7
47
79

I think this question is about queryable history. – MaxSem Jul 02 '12 at 13:41

score 0 · Answer 4 · answered Jul 02 '12 at 14:46

I am using approach, where each object that I'm dealing with has at least 1 so called instance table, where I keep the data that tends to change over time. Typically such tables follow the following concept:

they have _HISTORY suffix in the name;
they have 2 extra fields, start_dt and end_dt, indicating object instance's lifetime;
start_dt is NOT NULL, end_dt can be NULL, which indicates that instance is current and is not limited in it's time;
it is possible to insert future-dated changes, say you want a new company name to be activated from 1/Jan-2013, then you need to set end_dt of the current instance to 31/Dec-2012 23:59:59 and insert a new record with start_dt of 1/Jan-2013 00:00:00;
sometimes I also add revision field, if it is necessary to track revisions.

In order to have a proper RI constraints with such design, I always have 2 tables for versioned obejcts. Say, for Customer obejct I have the following set of tables:

customer (customer_id INTEGER, PRIMARY KEY (customer_id));
customer_history (customer_id INTEGER, start_dt TIMESTAMP, end_dt TIMESTAMP,
                  name VARCHAR(50), sex CHAR(1), ...,
                  PRIMARY KEY (customer_id, start_dt));
customer_bank_history (customer_id INTEGER, start_dt TIMESTAMP, end_dt TIMESTAMP,
                       bank_id INTEGER, iban VARCHAR(34));

In all other places I use customer(customer_id) to build foreign keys. Querying actual customer details is simple:

SELECT c.customer_id, ch.name, ch.sex
  FROM customer c
  JOIN customer_history ch ON c.customer_id = ch.customer_id
       AND now() BETWEEN ch.start_dt AND coalesce(end_dt, now());

Why I prefer such design:

I have versioned object instances on the database level by design;
I have to maintain less tables;
It is not possible to get history lost in case somebody drops/disables any triggers;
I can plan and maintain future-dated changes easily.

Hope this will help you.

score 0 · Answer 5 · answered Jun 29 '15 at 18:09

Datadiff. API powered DB revision tracking.

Full disclosure:

I built Datadiff. I needed a solution that provided a visual history of a data model in MongoDB for help supporting a SASS product. It will work with SQL databases too.

You can use do basic querying with key:val notation. i.e id:123

Database Design: how to track history?

5 Answers5