Store all data changes with every details (like Stackoverflow)

Question

I have system written using Codeigniter and as a database using MySQL. System have user, usergroups with different privileges and etc. Have lots of mysql tables which have many to many relationships.

Some of the tables I have:

items
contracts
customers
products
product_features
orders
order_features
order_products
etc...

Currently I am logging every change on data for these tables which made by users. Users can change these datas due to their privilege. Storing change of logs only simple form like

A user changed product features with id of A8767
B user added new customer with id 56
C user edited content of orderlist
A user added new product (id: A8767) to order (id: or67)
...

I want keep all changes which made with every detail, like edit history of question Stackoverflow. I can think about log_table design to keep all data changes from various tables. Is there any way, tutorial, engine , plugin to do that ? Only i can think make duplicate of every table and keep storing changes on them, but i dont think its good way.

You might want to read wikipedia's article about slowly changing dimensions. — ta.speot.is, Mar 24 '12 at 22:39
related: http://stackoverflow.com/questions/762405/database-data-versioning — Kaii, Mar 24 '12 at 23:40
related: http://stackoverflow.com/questions/39281/database-design-for-revisions — Kaii, Mar 24 '12 at 23:41
related: http://stackoverflow.com/questions/2015232/database-design-for-audit-logging — Kaii, Mar 24 '12 at 23:47
I'm still very sad that this question was closed. On which SE site does it belong, when not here? — Kaii, Jul 03 '18 at 21:03

Kaii · Accepted Answer · 2014-06-19T22:12:12.213

I've been thinking about that for a while now and can only think of two ways to do this. Both can work fully transparent when crafted into an abstract data layer / model.

By the way there is an implementation for "versionable" table data in the ORM mapper doctrine. See this example in their docs. Maybe that fits your needs, but it doesn't fit mine. It seems to delete all history data when the original record is deleted, making it not really revision safe.

Option A: have a copy of each table to hold revision data

Lets say you have a simple contact table:

CREATE TABLE contact (
    id INT NOT NULL auto_increment,
    name VARCHAR(255),
    firstname VARCHAR(255),
    lastname VARCHAR(255),
    PRIMARY KEY (id)
)

You would create a copy of that table and add revision data:

CREATE TABLE contact_revisions (
    id INT NOT NULL,
    name VARCHAR(255),
    firstname VARCHAR(255),
    lastname VARCHAR(255),
    revision_id INT auto_increment,
    type ENUM('INSERT', 'UPDATE', 'DELETE') NOT NULL,
    change_time DEFAULT current_timestamp,
    PRIMARY KEY(revision_id)
)

Keep track of INSERT and UPDATE using AFTER triggers. On each new data revision in the original, insert a copy of the new data in the revision table and set the modification type properly.

To log a DELETE revisionally safe you must also insert a new row in the history table! For this you should use a BEFORE DELETE trigger and store the latest values before they are deleted. Otherwise you will have to remove every NOT NULL constraint in the history table as well.

Some important notes regarding this implementation

For the history table you must drop every UNIQUE KEY (here: the PRIMARY KEY) from the revision table because you will have the same key multiple times for each data revision.
When you ALTER the schema and data in the original table via an update (e.g. software update) you must ensure the same data or schema corrections are applied to the history table and its data, too. Otherwise you will run into trouble when reverting to an older revision of a record set.
In a real world implementation you would want to know which user modified the data. To have that revisionally safe a user record should be never deleted from the users table. You should just set the account disabled with a flag.
Usually, a single user action involves more than one table. In a real world implementation, you would also have to keep track which changes in multiple tables belong to a single user transaction and also in which order. In a real use case you would want to revert all changes of a single transaction together, in a reverse order. That would require an additional revision table which keeps track on the users and transactions and holds a loose relationship to all those individual revisions in the history tables.

Benefits:

completely in database, independent from application code. (well, not when tracking user transactions is important. that would require some logic outside the scope of the single query)
all data is in their original format, no implicit type conversions.
good performance on search in the revisions
easy rollback. Just do a simple INSERT .. ON DUPLICATE KEY UPDATE .. statement on the original table, using the data from the revision you want to roll back.

Merits:

Hard to implement manually.
Hard (but not impossible) to automate when it comes to database migrations / application updates.

As already stated above, doctrines versionable does something similiar.

Option B: have a central change log table

preface: bad practice, shown for illustration of the alternative only.

This approach does heavily rely on application logic, which should be hidden in a data layer / model.

You have a central history table that keeps track on

Who did
when
modify, insert or delete
what data
in which field
of which table

Like in the other approach, you may also want to track which individual data changes belong to a single user action / transaction and in which order.

Benefits:

no need to keep in sync with the original table when adding fields to a table or creating a new table. it scales transparently.

Merits:

bad practice using a simple value = key store in database
bad search performance, because of implicit type conversions
may slowdown overall performance of the application/database, when the central history table becomes a bottleneck because of write locks (this only applies for specific engines with table locks, i.e. MyISAM)
It's much harder to implement rollbacks
possible data conversion errors / precision loss because of implicit type conversion
doesn't keep track of changes when you directly access the database somewhere in your code instead of using your model / data layer and forget that in this case you must write to the revision log manually. May be a big issue when working in a team with other programmers.

Conclusion:

Option B can be very handy for small apps as a simple "drop in" when its just for logging changes.
If you want to go back in time and be able to easily compare the differences between historic revison 123 to revision 125 and/or revert to the old data, then Option A is the hard way to go.

Merit 1: It is not key value pair. Merit 2: Possible, but with timestamps, object_ids, and user_id it would be feasible to track who acted upon which object. Merit 3: Rollbacks should be contained within transactions, if the change didn't commit, then no entry in log table. Merit 4: Refer to merit 2 response. Merit 5: What difference does it make it if goes through model or a straight call from the application? If a change is made he can call a TransactionLoggable class from anywhere, however it is best left in the model, but not restricted to it. — Mike Purcell, Mar 24 '12 at 23:07
@MikePurcell 1. it is. key = object_id, value = whatever you store in the single data field 3. "rollback" refers to manually rolling your data back to an older revison. Just like wikipedia does. i'm not talking about database transaction commit/rollback. 5. didn't say it's not impossible to store history data when the programmer wants to. just wanted to say you can easily forget that you also have to write history data manually. Its more handy to have the database handle the revisions fully transparent. — Kaii, Mar 24 '12 at 23:14
@MikePurcell generally *Option B* may be okay when its just for *logging* changes. if you want a system where you can easily compare one revision to another (not necessarily the current one) then *Option B* is not a good fit. — Kaii, Mar 24 '12 at 23:18
I agree with your 2nd statement. I re-read the OP and the title and the usage example seem to conflict. Your option A is for versioning the contents of a table at a specific time so a user can roll back the data to a previous version, with which I agree. However I feel that my proposed solution would work well for his usage example, where it's just logging action upon an item. — Mike Purcell, Mar 24 '12 at 23:21
@Kaii Thank for detailed responce. I will keep analyze all Options and give decision — safarov, Mar 24 '12 at 23:23
In your "option A", a lot of space gets to waste, because of all the duplicate data. I like this design better: http://stackoverflow.com/a/2020371/1286942 — Jo Smo, Nov 03 '15 at 15:06
@JoSmo actually this is just a de-normalized variant of "option A". In fact I'm doing the same thing in my own applications that to audit trails to keep the number of revisions / amount of duplicate data low. — Kaii, Nov 03 '15 at 15:10
Did you mean that it's a normalized variant of "option A"? Another thing... You always have to update at least two rows in your db if you want to edit the "dynamic" content (for which you need the history tracking). Because of the duplicate entries. — Jo Smo, Nov 03 '15 at 15:35
@JoSmo you are right. but that's the prize you have to pay for a good audit trail. additional features rarely come without costs ;) — Kaii, Nov 05 '15 at 11:28
@Kaii I'm not saying that your option is worse in all cases. It depends on many factors. Just wanted to put as much information for the OP out there. :) — Jo Smo, Nov 07 '15 at 18:41

score 2 · Answer 2 · answered Mar 24 '12 at 22:45

What about using general uni update table. Table fields should hold these values:

user,event,date,table,field,new value

user - who made change
event - as code of predefined events (update, save, insert)
date - when the change was made
table & field - can be auto localized from global query
value - inserted value

Values and insert can be created with some function from general query.

Store all data changes with every details (like Stackoverflow)

2 Answers2

Linked