Short version: I'm looking for suggestions on how to implement a database versioning system that keeps track of relations as well as record data for moderating a social web app with user-editable content.
Long version: I'm developing a social app where all the users can edit content. To give you an example, let's say one type of an item to edit is a Book
. A Book
can have a title
and a few authors
(many-to-many relation to authors
table). Of course this example is rather simple; real items will have many more fields as well as relations, but the rule should be the same.
Now, I need to implement a versioning system to moderate changes made by users. Let's say there are two groups of users: normal users and trusted users. Changes made by normal users are logged, but aren't commited until moderator accepts that particular change. Changes made by trusted users are commited immediately, but still logged so that they can be reverted at any time.
If I were to keep revisions of a Book
from the example, I would need to keep track of changing relations with authors
. I need to be able to track adding and deleting relations, and be able to revert them.
So if the title
gets modified, I need to be able to revert the change. If a relation to an author
gets deleted, I need to be able to revert that, as well as if some relation gets added, I need to be able to revert that too.
I only need to keep track of an item and it's relations, not anything that it relates to. If we had tables foos
, bars
, and foos_bars
, I would be interested only in logging foos
and foos_bars
, bars
would be pretty independent.
I'm familiar with this great question, as well as it's kind-of-an-adversary solution, and pretty comprehensive article on the second approach and it's follow-up. However, none of those give any special consideration to keeping track of relations as well as normal table data that would be obvious answer to my problem.
I like the one-table-for-all-history approach, as it allows for keeping only part of changes easily, and undo others. Like if one user submitted fields A
and B
, and then second user submitted A
and B
, it would be easy to undo just second user's B
-change, and keep the A
. It's also nice to have one table for the whole functionality, as opposed to many tables with the other approach. It also makes it easy to see who did exactly what (e.g. modified only foobar
field) - it doesn't seem to be easy with the other approach. And it seems like it would be easier to automate the moderation process - we don't really even need to know table names, as everything needed is stored in a revision record.
If I were to use the one-revisions-table-for-each-revisioned-table approach, having limited experience in writing triggers, I don't know if it would be possible or relatively easy to implement a system that automatically records an edit, but doesn't commit it immediately unless some parameter is set (e.g. edit_from_trusted_user == true
). And it makes me think of triggers invoking when I wouldn't really want them to (as the moderation wouldn't apply to e.g. changes made by admin, or some other "objects" that could try to modify the data).
No matter which solution I choose, it seems as if I'll have to add a rather artificial id
to all many-to-many relation tables (instead of [book_id, author_id]
I would have [id, book_id, author_id]
).
I thought about implementing relations in the one table approach like so: if we have standard revision table structure
[ID] [int]
[TableName] [varchar]
[RecordID] [int]
[FieldName] [varchar]
[OldValue] [varchar]
[NewValue] [varchar]
[EventType] [enum]
[EventDate] [datetime]
[UserID] [int]
we could store relations by simply setting RecordID
and FieldName
to NULL
, EventType
to either ADD
or DELETE
, and OldValue
and NewValue
to relation's foreign keys. The only problem is, some of my relations have some additional data (like a graph's edge weight), so I would have to store that somewhere too. Then again, operation of adding a new relation could be split into 2-event sequence: ADD and SET(weight), but then artificial relation ID
s would be needed, and I'm not sure if such a solution wouldn't have some bad implications in the future.
There will be around 5 to 10 versioned tables, each with, on average, 3 many-to-many relations to keep track of. I'm using MySQL on InnoDB, app is written in PHP 5.3 and connected to the db using PDO. Putting versioning in the app logic instead of db triggers is fine with me. I just need the whole thing to work, and be reasonably efficient. I expect reverts to occur rather seldom compared to edits, and edits will be few compared to number of views of content. Only moderators will access revision data, to either accept or reject recent changes.
Do you have any experience implementing such system? What are suggested solutions to this problem? Any considerations that come to mind?
I searched SO and the net for quite some time, but didn't find anything to help me with the matter. However, if I missed something, I'll be grateful for any links / directions.
Thanks.