2

While working on a content management system, I've hit a bit of a wall. Coming back to my data model, I've noticed some issues that could become more prevalent with time.

Namely, I want to maintain a audit trail (change log) of record modification by user (even user record modifications would be logged). Due to the inclusion of an arbitrary number of modules, I cannot use a by-table auto incrementation field for my primary keys, as it will inevitably cause conflicts while attempting to maintain their keys in a single table.

The audit trail would keep records of user_id, record_id, timestamp, action (INSERT/UPDATE/DELETE), and archive (a serialized copy of the old record)

I've considered a few possible solutions to the issue, such as generating a UUID primary key in application logic (to ensure cross database platform compatibility).

Another option I've considered (and I'm sure the consensus will be negative for even considering this method) is, creating a RecordKey table, to maintain a globally auto-incremented key. However, I'm sure there are far better methods to achieve this.

Ultimately, I'm curious to know of what options I should consider in attempting to implement this. For example, I intend on permitting (to start at least) options for MySQL and SQLite3 storage, but I'm concerned about how each database would handle UUIDs.

Edit to make my question less vague: Would using global IDs be a recommended solution for my problem? If so, using a 128 bit UUID (application or database generated) what can I do in my table design that would help maximize query efficiency?

Dan Lugg
  • 20,192
  • 19
  • 110
  • 174

3 Answers3

3

Ok, you've hit a brick wall. And you realise that actually the db design has problems. And you are going to keep hitting this same brick wall many times in the future. And your future is not looking bright. And you want to change that. Good.

But what you have not yet done is, figure what the actual cause of this is. You cannot escape from the predictable future until you do that. And if you do that properly, there will not be a brick wall, at least not this particular brick wall.

First, you went and stuckIdiot columns on all the tables to force uniqueness, without really understanding the Identifiers and keys that used naturally to find the data. That is the bricks that the wall is made from. That was an unconsidered knee-jerk reaction to a problem that demanded consideration. That is what you will have to re-visit.

  1. Do not repeat the same mistake again. Whacking GUIDs or UUIDs, or 32-byteIdiot columns to fix yourNUMERIC(10,0) Idiot columns will not do anything, except make the db much fatter, and all accesses, especially joins, much slower. The wall will be made of concrete blocks and it will hit you every hour.

  2. Go back and look at the tables, and design them with a view to being tables, in a database. That means your starting point is No Surrrogate Keys, noIdiot columns. When you are done, you will have very fewId columns. Not zero, not all tables, but very few. Therefore you have very few bricks in the wall. I have recently posted a detailed set of steps required, so please refer to:

    Link to Answer re Identifiers

  3. What is the justification of having one audit table containing the audit "records" of all tables ? Do you enjoy meeting brick walls ? Do you want the concurrency and the speed of the db to be bottlenecked on the Insert hot-spot in one file ?

    Audit requirements have been implemented in dbs for over 40 years, so the chances of your users having some other requirement that will not change is not very high. May as well do it properly. The only correct method (for a Rdb) for audit tables, is to have one audit table per auditable real table. The PK will be the original table PK plus DateTime (Compound keys are normal in a modern database). Additional columns will be UserId and Action. The row itself will be the before image (the new image is the single current row in the main table). Use the exact same column names. Do not pack it into one gigantic string.

    If you do not need the data (before image), then stop recording it. It is a very silly to be recording all that volume for no reason. Recovery can be obtained from the backups.

  4. Yes, a single RecordKey table is a monstrosity. And yet another guaranteed method of single-threading the database.

Do not react to my post, I can already see from your comments that you have all the "right" reasons for doing the wrong thing, and keeping your brick walls intact. I am trying to help you destroy them. Consider it carefully for a few days before responding.

Community
  • 1
  • 1
PerformanceDBA
  • 32,198
  • 10
  • 64
  • 90
  • 2
    @iDevelop. Thanks. That's exactly the right term. Without roots, the tree just falls over. If you model the data using the IDEF1X Relational modelling Standard, those roots, the Identifiers, are actually **visible** in the Data Model. – PerformanceDBA Nov 27 '10 at 14:47
  • 1
    Thanks **PerformanceDBA**; I appreciate your direct and forward perspective. I have redesigned my tables a few times and despite having gone with a BINARY(16) key where (I felt) necessary, I have still been exploring my options. Your feedback has provoked me to research and reconsider my design, and I will be reading up on IDEF1X Relational Modeling Standard as mentioned in your last comment. Lollers at id-iot :P – Dan Lugg Nov 30 '10 at 04:00
  • 2
    @Tomcat. My pleasure. Ok, follow [this question/answer](http://stackoverflow.com/questions/4310769/bulletin-board-database-optimisation/4311900#4311900) for a few days, it will be a great example of how to follow IDEF1X methodology, and expose and distil those **Identifiers**. – PerformanceDBA Nov 30 '10 at 10:06
2

How about keeping all the record_id local to each table, and adding another column table_name (to the audit table) to make for a composite key?

This way you can also easily filter your audit log by table_name (which will be tricky with arbitrary UUID or sequence numbers). So even if you do not go with this solution, consider adding the table_name column anyway for the sake of querying the log later.

In order to fit the record_id from all tables into the same column, you would still need to enforce that all tables use the same data type for their ids (but it seems like you were planning to do that anyway).

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • Thanks **Thilo**; That is a consideration I've made, adding `table_name` and using it as you've described. Certainly, adding it as an additional piece of data for the sake of completeness in the audit trail could help. But consider this situation; A user uninstalls a Blog module in favor of a more comprehensive one, yet both Blog modules have used the same naming scheme for tables. Conflicts could arise, as I don't intend on cascading the deletion of audit records. This is why I'm leaning in favor of some sort of global-key implementation. – Dan Lugg Nov 26 '10 at 05:02
  • 1
    That conflict you describe would not be a problem since you still have timestamps to know which module used which table when. The only problem would be when someone installs both modules at the same time, but that seems not possible since the database also likes its table names distinct. In fact, your audit table with the above approach would work very similar to how the database's own metadata tables work. – Thilo Nov 26 '10 at 05:21
  • Thanks **Thilo**; You are correct about installations. Module installation checks for existing tables, paths, etc., and depending on severity, notifies the user, or fails altogether. – Dan Lugg Nov 26 '10 at 08:18
2

A more powerful scheme is to create an audit table that mirrors the structure of each table rather than put all the audit trail into one place. The "shadow" table model makes it easier to query the audit trail.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • Thanks **dportas**; That's something I also considered, however managing tables of dead modules is far too much overhead, especially when situations as discussed above with Thilo occur. I won't need to query the audit data, typically just the 'metadata' of who/what/when. Storing of previous records would only be for recovery purposes, not advanced record comparisons/versioning. (Although, perhaps that is something to think about) – Dan Lugg Nov 26 '10 at 08:23
  • 1
    +1. This also solves the problem of having to serialize the row contents. You can just keep all the original columns and even use them in queries. – Thilo Nov 26 '10 at 09:06