2

I have an app that communicates with my API that runs php and mysql. What I wanted to do was record changes that occur to entities in my table for each user. If a user makes a change to their data, I can see the change that occurred. This way if they ever have questions or accidentally delete something, I can go back and tell them what the entities looked like at various stages in the year.

I don't need to be crazy specific about the differences, all I would like to do is record inserts or updates (as it's represented in a JSON body).

Basically what I did for now was any time a POST/PUT occurs to my API for certain routes, I just take the JSON in the request body, and I save it to a record in the database as a transaction that took place for that user.

This was great early on, but after hundreds of thousands of records, the JSON body is large and is taking up a lot of room. My database table is 13GB. Queries take a while to run, too. I truncated it, but within 4 months it grew again to another 10GB. This problem will likely only get larger.

Is there an approach someone can recommend to record this? Can I maybe send the request body over to something on AWS or some other storage offline or another database somewhere else? Flat files perhaps or a non-relational database? It's not like I actually need the data in real time but if I ever wanted to get a history of someone I'd like to know I could.

I do take nightly backups of the DB, so an alternate approach was I was thinking of cutting out the transaction logs entirely, and instead just letting it continue to back up nightly. Sure, I won't be able to show a history of what dates entities were updated/added, but at least I could always reference a few backups to see what records were for a given user on a certain date after I do a restore.

Any ideas or suggestions? Thanks!

NullHypothesis
  • 4,286
  • 6
  • 37
  • 79

1 Answers1

0

Instead of logging the entire JSON, you can just log the values that have changed and you also don't have to log your insert data as your database will always have the current record and logging the insert data is redundant.

You can implement a Diff function to compare difference in your existing JSON to the changed JSON.

To illustrate an example see the code below that borrows a JavaScript Diff function from this Answer.

// get the current value from your database
var oldvalues = {
  "id": 50,
  "name": "Old Name",
  "description": "Description",
  "tasks": [{
    'foo': 'bar'
  }]
};
var newvalues = {
  "id": 50,
  "name": "New name",
  "description": "Description",
  "tasks": [{
    'foo': 'bar'
  }]
};

var isEmptyObject = function(obj) {
                    var name;
                    for (name in obj) {
                        return false;
                    }
                    return true;
                };


 var diff = function(obj1, obj2) {
                    var result = {};
                    var change;
                    for (var key in obj1) {
                        if (typeof obj2[key] == 'object' && typeof obj1[key] == 'object') {
                            change = diff(obj1[key], obj2[key]);
                            if (isEmptyObject(change) === false) {
                                result[key] = change;
                            }
                        }
                        else if (obj2[key] != obj1[key]) {
                            result[key] = obj2[key];
                        }
                    }
                    return result;
                };

var update = diff(oldvalues, newvalues);
//save this to your database

$('#diff').text(JSON.stringify(update));
textarea {
  width: 400px;
  height: 50px
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>


<textarea id="diff"></textarea>

As you can see only the only change that would be saved is {"name":"New name"} which will cut down on your data usage.

You would of course need to either port this PHP or look at some existing packages such as node-rus-diff that might serve your needs.

As long as you are keeping a timestamp or a sequence number you can chain multiple transactions to rollback to any prior state. This is analogous to doing an incremental backup.

You could also run a maintenance task at set intervals if you would like to create checkpoints and compare a current state to a previous state. Perhaps once a month take a back up and record the differences between objects that have changed. This would be analogous to a differential backup.

Finally, you can take a full back up and clear out out the previous transactions, analogous to a full back up.

It is common practice for administrators to perform a combination of incremental, differential and full backups to balance storage costs and recovery needs. Using these approaches outline above you can implement the strategy that is right for you.

Alexander Higgins
  • 6,765
  • 1
  • 23
  • 41
  • thanks for taking the time to read my post. 90% of the requests are actually POSTS (new entities), sorry I failed to mention that. In a case like this, would you have other thoughts about an approach? Thanks Alex – NullHypothesis Jul 20 '17 at 13:39
  • Yes, absolutely, You don't need to audit the insert. It is redundant. Your database already has the properties of the new record no need to have a duplicate copy in your logs. – Alexander Higgins Jul 20 '17 at 16:54