166

I've been asked if I can keep track of the changes to the records in a MySQL database. So when a field has been changed, the old vs new is available and the date this took place. Is there a feature or common technique to do this?

If so, I was thinking of doing something like this. Create a table called changes. It would contain the same fields as the master table but prefixed with old and new, but only for those fields which were actually changed and a TIMESTAMP for it. It would be indexed with an ID. This way, a SELECT report could be run to show the history of each record. Is this a good method? Thanks!

Stephen Kennedy
  • 20,585
  • 22
  • 95
  • 108
Edward
  • 9,430
  • 19
  • 48
  • 71

9 Answers9

239

Here's a straightforward way to do this:

First, create a history table for each data table you want to track (example query below). This table will have an entry for each insert, update, and delete query performed on each row in the data table.

The structure of the history table will be the same as the data table it tracks except for three additional columns: a column to store the operation that occured (let's call it 'action'), the date and time of the operation, and a column to store a sequence number ('revision'), which increments per operation and is grouped by the primary key column of the data table.

To do this sequencing behavior a two column (composite) index is created on the primary key column and revision column. Note that you can only do sequencing in this fashion if the engine used by the history table is MyISAM (See 'MyISAM Notes' on this page)

The history table is fairly easy to create. In the ALTER TABLE query below (and in the trigger queries below that), replace 'primary_key_column' with the actual name of that column in your data table.

CREATE TABLE MyDB.data_history LIKE MyDB.data;

ALTER TABLE MyDB.data_history MODIFY COLUMN primary_key_column int(11) NOT NULL, 
   DROP PRIMARY KEY, ENGINE = MyISAM, ADD action VARCHAR(8) DEFAULT 'insert' FIRST, 
   ADD revision INT(6) NOT NULL AUTO_INCREMENT AFTER action,
   ADD dt_datetime DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP AFTER revision,
   ADD PRIMARY KEY (primary_key_column, revision);

And then you create the triggers:

DROP TRIGGER IF EXISTS MyDB.data__ai;
DROP TRIGGER IF EXISTS MyDB.data__au;
DROP TRIGGER IF EXISTS MyDB.data__bd;

CREATE TRIGGER MyDB.data__ai AFTER INSERT ON MyDB.data FOR EACH ROW
    INSERT INTO MyDB.data_history SELECT 'insert', NULL, NOW(), d.* 
    FROM MyDB.data AS d WHERE d.primary_key_column = NEW.primary_key_column;

CREATE TRIGGER MyDB.data__au AFTER UPDATE ON MyDB.data FOR EACH ROW
    INSERT INTO MyDB.data_history SELECT 'update', NULL, NOW(), d.*
    FROM MyDB.data AS d WHERE d.primary_key_column = NEW.primary_key_column;

CREATE TRIGGER MyDB.data__bd BEFORE DELETE ON MyDB.data FOR EACH ROW
    INSERT INTO MyDB.data_history SELECT 'delete', NULL, NOW(), d.* 
    FROM MyDB.data AS d WHERE d.primary_key_column = OLD.primary_key_column;

And you're done. Now, all the inserts, updates and deletes in 'MyDb.data' will be recorded in 'MyDb.data_history', giving you a history table like this (minus the contrived 'data_columns' column)

ID    revision   action    data columns..
1     1         'insert'   ....          initial entry for row where ID = 1
1     2         'update'   ....          changes made to row where ID = 1
2     1         'insert'   ....          initial entry, ID = 2
3     1         'insert'   ....          initial entry, ID = 3 
1     3         'update'   ....          more changes made to row where ID = 1
3     2         'update'   ....          changes made to row where ID = 3
2     2         'delete'   ....          deletion of row where ID = 2 

To display the changes for a given column or columns from update to update, you'll need to join the history table to itself on the primary key and sequence columns. You could create a view for this purpose, for example:

CREATE VIEW data_history_changes AS 
   SELECT t2.dt_datetime, t2.action, t1.primary_key_column as 'row id', 
   IF(t1.a_column = t2.a_column, t1.a_column, CONCAT(t1.a_column, " to ", t2.a_column)) as a_column
   FROM MyDB.data_history as t1 INNER join MyDB.data_history as t2 on t1.primary_key_column = t2.primary_key_column 
   WHERE (t1.revision = 1 AND t2.revision = 1) OR t2.revision = t1.revision+1
   ORDER BY t1.primary_key_column ASC, t2.revision ASC
transient closure
  • 2,511
  • 2
  • 12
  • 6
  • 4
    i really like this solution. however if your main table has no primary key or you dont know what the primary is, its a bit tricky. – Benjamin Eckstein Jan 27 '14 at 21:17
  • 2
    I recently ran into a problem using this solution for a project, because of how all indexes from the original table are copied to the history table (due to how CREATE TABLE ... LIKE .... works). Having unique indexes on the history table can cause the INSERT query in the AFTER UPDATE trigger to barf, so they need to be removed. In the php script I have that does this stuff, I query for any unique indexes on newly created history tables (with "SHOW INDEX FROM data_table WHERE Key_name != 'PRIMARY' and Non_unique = 0"), and then remove them. – transient closure May 13 '15 at 00:11
  • 3
    Here we are getting repeated data inserted in the backup table every time. Let if we have 10 fields in a table and we have updated 2 then we are adding repeated data for rest 8 fields. How to overcome from it? – itzmukeshy7 Jun 03 '15 at 11:54
  • 1
    Not sure what version of MySQL you are using for this, but looks like in 5.1 you can't create a compound key with auto_increment at the same time. "Incorrect table definition; there can be only one auto column and it must be defined as a key". Had to create the AI first then add revision to the PK after. – dbinott Jun 09 '15 at 13:29
  • Sorry, I was getting that in 5.6 not 5.1 – dbinott Jun 09 '15 at 13:48
  • 1
    n/m, i wasn't using MyISAM. Thought I was. :( – dbinott Jun 09 '15 at 19:32
  • 8
    You can avoid accidentally carrying over the various indices by changing the create table statement to `CREATE TABLE MyDB.data_history as select * from MyDB.data limit 0;` – Eric Hayes Aug 11 '16 at 01:14
  • This is awesome. Should have been accepted as the best solution. – Aaron Lozier Nov 15 '16 at 18:44
  • 4
    @transientclosure how would you propose getting other fields into the history that weren't part of the original query? eg I want to track who makes those changes. for insert it already has an `owner` field, and for update I could add an `updatedby` field, but for delete I'm not sure how I could do that via the triggers. updating the `data_history` row with the user id in feels dirty :P – Horse Jun 19 '17 at 20:46
  • 1
    This approach needs the versions table to be of MyISAM, which does table level locking. Would this approach work well for production systems where multiple inserts/updates are happening every second? Does any one have any real production systems experience about this approach? – aadidasu Jul 20 '17 at 22:37
  • Can anyone explain why I would lose the revision count per PK? All of a sudden it's just incrementing revisions by 1 regardless of key so it is up to 182 now. Need to figure this out. – dbinott Aug 03 '17 at 12:33
  • Unnecessarily complicated, and difficult to maintain. Look at my answer, which is how its done in large internet companies these days – Ouroboros Aug 19 '17 at 15:22
  • I like this but I did run into one issue with it. A delete operation failed due to a foreign key constraint but despite the delete not being successful, the delete operation was still logged in the history because the trigger is before the delete operation. – Nick Gotch Jul 03 '18 at 18:10
  • How would you alter your code to enable this to work on a composite primary key, i.e. when you have a primary_key_column_1 and primary_key_column_2 ? – kateryna Feb 09 '19 at 15:57
  • It would be perfect if the `CREATE VIEW data_history_changes` would somehow use a loop the automatically show all columns of the history table, and not having to define each file manually for the statement. – membersound Aug 24 '21 at 15:30
  • I used this solution for some time but recently had to migrate my database to an environment that does not support MyISAM tables. I was able to convert the history table to InnoDB by dropping the primary key, dropping the 'revision' column, and by modifying the triggers accordingly (take out the 'NULL' that was inserted into 'revision' column). This allowed me to set the table engine to InnoDB as I don't really need the revision column anyways - I can always calculate that pretty easily based on the 'dt_datetime' column. – zeke Nov 16 '21 at 01:17
  • Column count doesn't match value count at row 1 All changes were reverted (DDL statements can't be reverted). hard to read – Yogi Arif Widodo Aug 24 '22 at 11:47
  • ah solved my trigger typo on replace all haha , thanks btw – Yogi Arif Widodo Aug 24 '22 at 11:53
  • How did you get to have the autoincrement in the revision column to work like that? Mine just increases 1, 2, 3 for each update and not based on revision. – Ed1123 Jan 26 '23 at 16:37
  • @Ed1123 you need to create a composite primary key, with one part being the primary key value from your data table, and the other being an INT column set to auto increment as part of the audit table. This only possible in MyISAM (well, last time I checked. it's been awhile). Check out the MySQL docs: https://dev.mysql.com/doc/refman/8.1/en/example-auto-increment.html (scroll down to 'MyISAM Notes') – transient closure Aug 03 '23 at 13:12
99

It's subtle.

If the business requirement is "I want to audit the changes to the data - who did what and when?", you can usually use audit tables (as per the trigger example Keethanjan posted). I'm not a huge fan of triggers, but it has the great benefit of being relatively painless to implement - your existing code doesn't need to know about the triggers and audit stuff.

If the business requirement is "show me what the state of the data was on a given date in the past", it means that the aspect of change over time has entered your solution. Whilst you can, just about, reconstruct the state of the database just by looking at audit tables, it's hard and error prone, and for any complicated database logic, it becomes unwieldy. For instance, if the business wants to know "find the addresses of the letters we should have sent to customers who had outstanding, unpaid invoices on the first day of the month", you likely have to trawl half a dozen audit tables.

Instead, you can bake the concept of change over time into your schema design (this is the second option Keethanjan suggests). This is a change to your application, definitely at the business logic and persistence level, so it's not trivial.

For example, if you have a table like this:

CUSTOMER
---------
CUSTOMER_ID PK
CUSTOMER_NAME
CUSTOMER_ADDRESS

and you wanted to keep track over time, you would amend it as follows:

CUSTOMER
------------
CUSTOMER_ID            PK
CUSTOMER_VALID_FROM    PK
CUSTOMER_VALID_UNTIL   PK
CUSTOMER_STATUS
CUSTOMER_USER
CUSTOMER_NAME
CUSTOMER_ADDRESS

Every time you want to change a customer record, instead of updating the record, you set the VALID_UNTIL on the current record to NOW(), and insert a new record with a VALID_FROM (now) and a null VALID_UNTIL. You set the "CUSTOMER_USER" status to the login ID of the current user (if you need to keep that). If the customer needs to be deleted, you use the CUSTOMER_STATUS flag to indicate this - you may never delete records from this table.

That way, you can always find what the status of the customer table was for a given date - what was the address? Have they changed name? By joining to other tables with similar valid_from and valid_until dates, you can reconstruct the entire picture historically. To find the current status, you search for records with a null VALID_UNTIL date.

It's unwieldy (strictly speaking, you don't need the valid_from, but it makes the queries a little easier). It complicates your design and your database access. But it makes reconstructing the world a lot easier.

Octavian Catusanu
  • 139
  • 1
  • 2
  • 13
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • But it would add duplicate data for those fields which are not updated? How to manage it? – itzmukeshy7 Jun 03 '15 at 11:51
  • With second approach problem arise for report generation if a customer record is edited over a time its difficult to recognise whether a particular entry belong to same customer or different. – Akshay Joshi Jun 11 '15 at 06:45
  • 1
    Best suggestion I've seen yet to this problem – Worthy7 Jul 15 '16 at 07:34
  • Oh and, in response to the comments, how about just storing null for everything else that didn't change? So the lastest version will be all of the latest data, but if the name used to be "Bob" 5 days ago, then just have one row, name = bob and valid until 5 days ago. – Worthy7 Jul 15 '16 at 07:36
  • I really like this. If you wanted to remove the complexity that this design adds in your queries, making a view that selects * from table where valid_until = null can give you current data views of each table simplifying your query logic. – Riaan Schutte Mar 26 '17 at 22:50
  • how does this work? in your example customer_id is still a pk, how will you do it a primary key if it will be duplicated? – pato.llaguno Oct 20 '17 at 17:01
  • 3
    The combination of customer_id and the dates are the primary key, so they will be guaranteed unique. – Neville Kuyt Oct 22 '17 at 10:40
  • GDPR compliance was not the goal of the example; it's a really complex area especially when combined with relational databases, so would really need to be a separate question... – Neville Kuyt Feb 11 '20 at 11:40
17

You could create triggers to solve this. Here is a tutorial to do so (archived link).

Setting constraints and rules in the database is better than writing special code to handle the same task since it will prevent another developer from writing a different query that bypasses all of the special code and could leave your database with poor data integrity.

For a long time I was copying info to another table using a script since MySQL didn’t support triggers at the time. I have now found this trigger to be more effective at keeping track of everything.

This trigger will copy an old value to a history table if it is changed when someone edits a row. Editor ID and last mod are stored in the original table every time someone edits that row; the time corresponds to when it was changed to its current form.

DROP TRIGGER IF EXISTS history_trigger $$

CREATE TRIGGER history_trigger
BEFORE UPDATE ON clients
    FOR EACH ROW
    BEGIN
        IF OLD.first_name != NEW.first_name
        THEN
                INSERT INTO history_clients
                    (
                        client_id    ,
                        col          ,
                        value        ,
                        user_id      ,
                        edit_time
                    )
                    VALUES
                    (
                        NEW.client_id,
                        'first_name',
                        NEW.first_name,
                        NEW.editor_id,
                        NEW.last_mod
                    );
        END IF;

        IF OLD.last_name != NEW.last_name
        THEN
                INSERT INTO history_clients
                    (
                        client_id    ,
                        col          ,
                        value        ,
                        user_id      ,
                        edit_time
                    )
                    VALUES
                    (
                        NEW.client_id,
                        'last_name',
                        NEW.last_name,
                        NEW.editor_id,
                        NEW.last_mod
                    );
        END IF;

    END;
$$

Another solution would be to keep an Revision field and update this field on save. You could decide that the max is the newest revision, or that 0 is the most recent row. That's up to you.

user812786
  • 4,302
  • 5
  • 38
  • 50
Keethanjan
  • 365
  • 1
  • 7
13

MariaDB supports System Versioning since 10.3 which is the standard SQL feature that does exactly what you want: it stores history of table records and provides access to it via SELECT queries. MariaDB is an open-development fork of MySQL. You can find more on its System Versioning via this link:

https://mariadb.com/kb/en/library/system-versioned-tables/

midenok
  • 930
  • 10
  • 14
  • 3
    Please note the following from the link above: "mysqldump does not read historical rows from versioned tables, and so historical data will not be backed up. Also, a restore of the timestamps would not be possible as they cannot be defined by an insert/a user." – Daniel Jun 19 '20 at 13:16
  • @Daniel This is planned for 10.8 as of now: https://jira.mariadb.org/browse/MDEV-16029 – midenok Oct 26 '21 at 12:44
  • 1
    mariabackup command is an alternative to mysqldump https://mariadb.com/kb/en/mariabackup-overview/ – hyamanieu Sep 28 '22 at 15:48
11

Here is how we solved it

a Users table looked like this

Users
-------------------------------------------------
id | name | address | phone | email | created_on | updated_on

And the business requirement changed and we were in a need to check all previous addresses and phone numbers a user ever had. new schema looks like this

Users (the data that won't change over time)
-------------
id | name

UserData (the data that can change over time and needs to be tracked)
-------------------------------------------------
id | id_user | revision | city | address | phone | email | created_on
 1 |   1     |    0     | NY   | lake st | 9809  | @long | 2015-10-24 10:24:20
 2 |   1     |    2     | Tokyo| lake st | 9809  | @long | 2015-10-24 10:24:20
 3 |   1     |    3     | Sdny | lake st | 9809  | @long | 2015-10-24 10:24:20
 4 |   2     |    0     | Ankr | lake st | 9809  | @long | 2015-10-24 10:24:20
 5 |   2     |    1     | Lond | lake st | 9809  | @long | 2015-10-24 10:24:20

To find the current address of any user, we search for UserData with revision DESC and LIMIT 1

To get the address of a user between a certain period of time we can use created_on bewteen (date1 , date 2)

Zenex
  • 131
  • 1
  • 3
  • It is a the solution that I want to have but I want to know How can you insert id_user in this table using trigger ? – thecassion Mar 26 '16 at 08:38
  • 1
    What happened to the `revision=1` of the `id_user=1`? First I'd thought that your counting was `0,2,3,...` but then I saw that for `id_user=2` the revision counting is `0,1, ...` – Pathros May 24 '16 at 15:02
  • 1
    You don't need `id` and `id_user` columns`. Just use a group ID of `id` (user ID) and `revision`. – Gajus Feb 03 '17 at 14:28
  • This method will give you nigtmares when you have to create queries with JOINS. You'll have to group by user id and sort by timestamp in descending order everytime you want to reference the table. You can create a view but you'll still increase your complexity. The best way is to have a separate audit table and use a trigger to handle INSERT, UPDATE, DELETE – TheRealChx101 Jun 03 '22 at 11:12
5

Why not simply use bin log files? If the replication is set on the Mysql server, and binlog file format is set to ROW, then all the changes could be captured.

A good python library called noplay can be used. More info here.

Ouroboros
  • 1,432
  • 1
  • 19
  • 41
  • 2
    Binlog can be used even if you don't have/need replication. Binlog has many beneficial use cases. Replication is probably the most common use case, but it can also be leveraged for backups and audit history, as mentioned here. – webaholik Oct 19 '17 at 21:02
3

Just my 2 cents. I would create a solution which records exactly what changed, very similar to transient's solution.

My ChangesTable would simple be:

DateTime | WhoChanged | TableName | Action | ID |FieldName | OldValue

1) When an entire row is changed in the main table, lots of entries will go into this table, BUT that is very unlikely, so not a big problem (people are usually only changing one thing) 2) OldVaue (and NewValue if you want) have to be some sort of epic "anytype" since it could be any data, there might be a way to do this with RAW types or just using JSON strings to convert in and out.

Minimum data usage, stores everything you need and can be used for all tables at once. I'm researching this myself right now, but this might end up being the way I go.

For Create and Delete, just the row ID, no fields needed. On delete a flag on the main table (active?) would be good.

Worthy7
  • 1,455
  • 15
  • 28
  • I had used something along this line but had FieldName and OldValue fields substituted by one field changedValues. In that field, I would have a JSON-stringified object with fieldName and oldValue as two properties. This way, there is only one record for any given update no matter how many fields were changed. Tradeoff: If the DBMS does not support native JSON data type, then search of which field was changed would not be efficient. – Sunny May 22 '23 at 00:45
2

In MariaDB 10.5+ this is as easy to setup as

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING 
  PARTITION BY SYSTEM_TIME;

Past history can then be queried by doing

SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2016-10-09 08:07:06';

There is currently no counterpart for this in MySQL.

See the documentation for more info. If you're on an older version of MariaDB, the documentation has an alternate syntax that has been available since MariaDB 10.3.4.

hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51
0

The direct way of doing this is to create triggers on tables. Set some conditions or mapping methods. When update or delete occurs, it will insert into 'change' table automatically.

But the biggest part is what if we got lots columns and lots of table. We have to type every column's name of every table. Obviously, It's waste of time.

To handle this more gorgeously, we can create some procedures or functions to retrieve name of columns.

We can also use 3rd-part tool simply to do this. Here, I write a java program Mysql Tracker

goforu
  • 51
  • 6
  • how can I use your Mysql Tracker? – webchun Mar 24 '17 at 07:06
  • 1
    1. Make sure you have an id column as primary key in each table. 2. Copy the java file to local(or IDE) 3. Import libs and edit the static variables from line 9-15 according to your database configuration and structure. 4. Parse and run the java file 5. Copy the console log and execute it as Mysql commands – goforu Apr 13 '17 at 08:40
  • `create table like table` i think replicates all columns easily – Jonathan Mar 16 '19 at 02:33