Database revisions for data and relations for moderating content changes

Question

Short version: I'm looking for suggestions on how to implement a database versioning system that keeps track of relations as well as record data for moderating a social web app with user-editable content.

Long version: I'm developing a social app where all the users can edit content. To give you an example, let's say one type of an item to edit is a Book. A Book can have a title and a few authors (many-to-many relation to authors table). Of course this example is rather simple; real items will have many more fields as well as relations, but the rule should be the same.

Now, I need to implement a versioning system to moderate changes made by users. Let's say there are two groups of users: normal users and trusted users. Changes made by normal users are logged, but aren't commited until moderator accepts that particular change. Changes made by trusted users are commited immediately, but still logged so that they can be reverted at any time.

If I were to keep revisions of a Book from the example, I would need to keep track of changing relations with authors. I need to be able to track adding and deleting relations, and be able to revert them. So if the title gets modified, I need to be able to revert the change. If a relation to an author gets deleted, I need to be able to revert that, as well as if some relation gets added, I need to be able to revert that too.

I only need to keep track of an item and it's relations, not anything that it relates to. If we had tables foos, bars, and foos_bars, I would be interested only in logging foos and foos_bars, bars would be pretty independent.

I'm familiar with this great question, as well as it's kind-of-an-adversary solution, and pretty comprehensive article on the second approach and it's follow-up. However, none of those give any special consideration to keeping track of relations as well as normal table data that would be obvious answer to my problem.

I like the one-table-for-all-history approach, as it allows for keeping only part of changes easily, and undo others. Like if one user submitted fields A and B, and then second user submitted A and B, it would be easy to undo just second user's B-change, and keep the A. It's also nice to have one table for the whole functionality, as opposed to many tables with the other approach. It also makes it easy to see who did exactly what (e.g. modified only foobar field) - it doesn't seem to be easy with the other approach. And it seems like it would be easier to automate the moderation process - we don't really even need to know table names, as everything needed is stored in a revision record.

If I were to use the one-revisions-table-for-each-revisioned-table approach, having limited experience in writing triggers, I don't know if it would be possible or relatively easy to implement a system that automatically records an edit, but doesn't commit it immediately unless some parameter is set (e.g. edit_from_trusted_user == true). And it makes me think of triggers invoking when I wouldn't really want them to (as the moderation wouldn't apply to e.g. changes made by admin, or some other "objects" that could try to modify the data).

No matter which solution I choose, it seems as if I'll have to add a rather artificial id to all many-to-many relation tables (instead of [book_id, author_id] I would have [id, book_id, author_id]).

I thought about implementing relations in the one table approach like so: if we have standard revision table structure

[ID] [int]
[TableName] [varchar]
[RecordID] [int]
[FieldName] [varchar]
[OldValue] [varchar]
[NewValue] [varchar]
[EventType] [enum]
[EventDate] [datetime]
[UserID] [int]

we could store relations by simply setting RecordID and FieldName to NULL, EventType to either ADD or DELETE, and OldValue and NewValue to relation's foreign keys. The only problem is, some of my relations have some additional data (like a graph's edge weight), so I would have to store that somewhere too. Then again, operation of adding a new relation could be split into 2-event sequence: ADD and SET(weight), but then artificial relation IDs would be needed, and I'm not sure if such a solution wouldn't have some bad implications in the future.

There will be around 5 to 10 versioned tables, each with, on average, 3 many-to-many relations to keep track of. I'm using MySQL on InnoDB, app is written in PHP 5.3 and connected to the db using PDO. Putting versioning in the app logic instead of db triggers is fine with me. I just need the whole thing to work, and be reasonably efficient. I expect reverts to occur rather seldom compared to edits, and edits will be few compared to number of views of content. Only moderators will access revision data, to either accept or reject recent changes.

Do you have any experience implementing such system? What are suggested solutions to this problem? Any considerations that come to mind?

I searched SO and the net for quite some time, but didn't find anything to help me with the matter. However, if I missed something, I'll be grateful for any links / directions.

Thanks.

Database revisions for data and relations for moderating content changes

0 Answers0