Keeping history of data revisions - best practice?

Question

Consider a database with several (3-4) tables with a lot of columns (from 15 to 40). In each table we have several thousand records generated per year and about a dozen of changes made for each record.

Right now we need to add a following functionality to our system: every time user makes a change to the record of one of our tables, the system needs to keep track of it - we need to have complete history of changes and also be able to restore row data to selected point.

For some reasons we cannot keep "final" and "historic" data in the same table (so we cannot add some columns to our tables to keep some kind of versioning information, i.e. like wordpress does when it comes to keeping edit history of posts).

What would be best approach to this problem? I was thinking about two solutions:

For each tracked table we have a mirror table with the same columns, and with additional columns where we keep information about versions (i.e. timestamps, id of "original" row etc...)

Pros:

we have data stored exactly in the same way it was in original tables
whenever we need to add a new column to the original table, we can do the same to mirror table

Cons:

we need to create one additional mirror table for each tracked table.

We create one table for "history" revisions. We keep some revisioning information like timestamps etc., and also we keep the track from which table the data originates. But the original data row is being stored in large text column in JSON.

Pros:

we have only one history table for all tracked tables
we don't need to create new mirror tables every time we add new tracked table,

Cons:

there can be some backward compatibility issues while trying to restore data after structure of the original table was changed (i.e. new column was added)

Maybe some other solution?

What would be the best way of keeping the history of versions in such system?

Additional information:

each of the tracked tables can change in the future (i.e. new columns added),
number of tracked tables can change in the future (i.e. new tables added).

FYI: we are using laravel 5.3 and mysql database.

score 0 · Answer 1 · answered Dec 15 '16 at 04:20

How often do you need access to the auditing data? Is cost of storage ever a concern? Do you need it in the same system that you need the normal data?

Basically, having a table called foo and a second table called foo_log isn't uncommon. It also lets you store foo_log somewhere differently, even possibly a secondary DB. If foo_log is on a spindle disk and foo is on flash, you still get fast reads, but you get somewhat cheaper storage of the backups.

If you don't ever need to display this data, and just need it for legal reasons, or to figure out how something went wrong, the single-table isn't a terrible plan.

But if the issue is backups, which it sounds like it might be, why not just backup the MySQL database on a regular basis and store the backups elsewhere?

No, the issue is not backups, but edits. Tha data stored are a customer forms, which can be edited by workers (sent to financial department etc...). We need to keep track of every change any worker did and be able to revert it. We almost never access the logged/historical data directly - it's only in case if one of the workers screwed up really badly. — Gacek, Dec 15 '16 at 10:21

Keeping history of data revisions - best practice?

1 Answers1

Linked