How to design a database with Revision History?

Question

I am part of a team building a new Content Management System for our public site. I'm trying to find the easiest and best way to build-in a Revision Control mechanism. The object model is pretty basic. We have an abstract BaseArticle class that includes properties for version independent/meta data such as Heading & CreatedBy. A number of classes inherit from this such as DocumentArticle which has the property URL that will be a path to a file. WebArticle also inherits from BaseArticle and includes the FurtherInfo property and a collection of Tabs objects, which include Body that will hold the HTML to be displayed (Tab objects do not derive from anything). NewsArticle and JobArticle inherit from WebArticle. We have other derived classes, but these provide enough of an example.

We come up with two approaches to persistence for Revision Control. I call these Approach1 and Approach2. I've used SQL Server to do a basic diagram of each: Database Diagram of Approach 1 Database Diagram of Approach 2

With Approach1, the plan would be for fresh versions of Articles to be persisted via a database Update. A trigger would be set for updates and would insert the old data in to the xxx_Versions table. I think a trigger would need to be configured on every table. This approach does have the advantage that the only the head version of each article is held in the main tables, with old versions being hived off. This makes it easy to copy the head versions of articles from the development/staging database to the Live one.
With Approach2, the plan would be for fresh versions of Articles to be inserted into the database. The head version of articles would be identified through views. This seems to have the advantage of fewer tables and less code (e.g. not triggers).

Note that with both approaches, the plan would be to call an Upsert stored procedure for the table mapped to the relevant object (we must remember to handle the case of a new Article being added). This upsert stored procedure would call that for the class from which it derives e.g. upsert_NewsArticle would call upsert_WebArticle etc.

We are using SQL Server 2005, although I think this question is independent of database flavor. I've done some extensive trawling of the internet and have found references to both approaches. But I have not found anything which compares the two and shows one or the other to be better. I think with all the database books in the world, this choice of approaches must have arisen before.

My question is: which of these Approaches is best and why?

Have you considered buying a CMS and customizing it? They are deceptively hard and time-consuming to build well. It can end up being quite expensive. — Rex M, Sep 19 '11 at 00:42
It certainly becomes quite complicated when you move from the vision to the implementation. But I think we have the skills to build what we need...I just want to be sure we do the backend as well as we can. Moreover, if we picked an off-the-shelf-solution, I'd still be left with the theoretical question of which Approach to take :-( BTW, see the comments I made to Blender's post for links to interesting pages. — daniel, Sep 19 '11 at 09:56
Agreed with Rex M, this is not something you should be re-inventing. There are countless edge cases. — Chris Tucker, Jun 11 '18 at 18:43
@RexM I think adding extra NoSql databases would be good for instance key-value databases. it will keep the SQL database clean, takes less storage and if anything updates in the schema you can add an extra field in the integrated database, and if you want you can delete the history revision with just one click — Omid Zarinmahd, Jun 06 '23 at 21:25

Eduardo Cuomo · Answer 1 · 2022-01-05T17:48:55.393

My implementation is little complex maybe.

First, you have only one table to handle everything, in order to keep the model design and the data integrity at only one point.

This is the basic idea, you can extend the design with the created_by & updated_by columns if you need.

Implement on MySQL

The following implementation is for MySQL, but the idea can be implemented at other kind of SQL databases too.

Table

DROP TABLE IF EXISTS `myTable`;

CREATE TABLE `myTable` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary Key',
  `version` int(11) NOT NULL DEFAULT 0 COMMENT 'Version',
  `title` varchar(32) NOT NULL COMMENT 'Title',
  `description` varchar(1024) DEFAULT NULL COMMENT 'Description',
  `deleted_at` datetime DEFAULT NULL COMMENT 'Record deleted at',
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Record created at'
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `myTable`
  ADD PRIMARY KEY (`id`, `version`) USING BTREE,
  ADD KEY `i_title` (`title`);

The Record ID is defined by id & version.
With deleted_at, this model supports the soft delete feature.

Views

Get current version

Get current record version:

CREATE OR REPLACE VIEW vMyTableCurrentVersion AS
SELECT
    `id`
  , MAX(`version`) AS `version`
  , MIN(`created_at`) AS `created_at`
FROM `myTable`
GROUP BY `id`;

Get all records (include deleted records)

Get all records, including the soft deleted records:

CREATE OR REPLACE VIEW vMyTableAll AS
SELECT
    T.id
  , T.version

  , T.title
  , T.description

  , T.deleted_at
  , _T.created_at
  , T.created_at AS `updated_at`
FROM
  `myTable` AS T
  INNER JOIN vMyTableCurrentVersion AS _T ON
    T.id = _T.id
    AND T.version = _T.version;

Get records

Get records, removing the soft deleted records from result.

CREATE OR REPLACE VIEW vMyTable AS
SELECT *
FROM `vMyTableAll`
WHERE `deleted_at` IS NULL;

Triggers & Validations

For this example, I will implement a unique title validation:

DROP PROCEDURE IF EXISTS myTable_uk_title;
DROP TRIGGER IF EXISTS myTable_insert_uk_title;
DROP TRIGGER IF EXISTS myTable_update_uk_title;

DELIMITER //

CREATE PROCEDURE myTable_uk_title(id INT, title VARCHAR(32)) BEGIN
  IF (
    SELECT COUNT(*)
    FROM vMyTable AS T
    WHERE
      T.id <> id
      AND T.title = title
  ) > 0 THEN
    SIGNAL SQLSTATE '45000'
    SET MESSAGE_TEXT = 'Duplicated "title"', MYSQL_ERRNO = 1000;
  END IF;
END //

CREATE TRIGGER myTable_insert_uk_title BEFORE INSERT ON myTable
FOR EACH ROW
BEGIN
  CALL myTable_uk_title(NEW.id, NEW.title);
END //

CREATE TRIGGER myTable_update_uk_title BEFORE UPDATE ON myTable
FOR EACH ROW
BEGIN
  CALL myTable_uk_title(NEW.id, NEW.title);
END //

DELIMITER ;

Usage examples

Select

SELECT * FROM `vMyTable`;

Select with deleted records

SELECT * FROM `vMyTableAll`;

Insert / Add / New

INSERT INTO myTable (`title`) VALUES ('Test 1');

Update / Edit

The update action should be done with following code, instead of UPDATE ...:

INSERT INTO myTable (`id`, `version`, `title`, `description`)
SELECT
    `id`
  , `version` + 1 as `version` -- New version
  , `title`
  , 'New description' AS `description`
  FROM `vMyTable`
  WHERE id = 1;

Soft Delete

The soft delete action is other point in the history:

INSERT INTO myTable (`id`, `version`, `title`, `description`, `deleted_at`)
SELECT
    `id`
  , `version` + 1 as `version` -- New version
  , `title`
  , `description`
  , NOW() AS `deleted_at`
  FROM `vMyTable`
  WHERE id = 1;

Restore Soft Deleted record

INSERT INTO myTable (`id`, `version`, `title`, `description`, `deleted_at`)
SELECT
    `id`
  , `version` + 1 as `version` -- New version
  , `title`
  , `description`
  , null AS `deleted_at`
  FROM `vMyTableAll` -- Get with deleted
  WHERE id = 1;

Delete record & history

To delete related history records:

DELETE FROM `myTable` WHERE id = 1;

Record history

SELECT *
FROM `myTable`
WHERE id = 1
ORDER BY `version` DESC;

Cons

Unique key constrain is not possible, but you can create a Trigger to handle it.
Update many records at same time (UPDATE ...) is not possible if you want to save the history.
Delete many records at same time (DELETE ...) is not possible if you want to save the history.

Refs

https://gist.github.com/reduardo7/e5846b17eba288006579fc35269f3e96

score 1 · Answer 2 · answered May 24 '19 at 14:04

In general, the biggest advantage to history/audit side tables is performance:

any live/active data queried can be queried from much smaller main table
Any "live only" queries do not need to contain active/latest flag (or god forbid do a correllated subquery on timestamp to find out latest row), simplifying the code both for developers AND DB engine optimizer.

However, for small CMS with 100s or 1000s of rows (and not millions of rows), performance gains would be pretty small.

As such, for small CMS, Approach 3 would be better, as far as simpler design/less code/less moving pieces.

Approach 3 is ALMOST like Approach 2, except every table that needs a history/versioning has an explicit column containing a true/false "active" (a.k.a. live - a.k.a. latest) - flag column.

Your upserts are responsible for correctly managing that column when inserting new live version (or deleting current live version) of a row.

All of your "live" select queries outside UPSERT would then be trivial to amend, by adding "AND mytable.live = 1" to any query.

Also, hopefully obvious, but ANY index on any table should start with "active" column unless warranted otherwise.

This approach combines the simplicity of Approach 2 (no extra tables/triggers), with performance of approach 1 (no need to do correllated subquery on any table to find latest/current row - your upserts manage that via active flag)

score 1 · Answer 3 · answered Jan 10 '22 at 20:29

In order to keep all history, I implement something with two tables.

This is the basic idea! You can edit the title and description columns with your requirements.

Implement on MySQL

The following implementation is for MySQL, but the idea can be implemented at other kind of SQL databases too.

-- Tables

DROP TABLE IF EXISTS `users`;
CREATE TABLE IF NOT EXISTS `users` (
  `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `name` varchar(32) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `myTable`;
CREATE TABLE IF NOT EXISTS `myTable` (
  `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `title` varchar(32) NOT NULL,
  `description` varchar(2048) DEFAULT NULL,
  `edited_by` int(10) UNSIGNED DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `title` (`title`) USING BTREE,
  KEY `myTalbe_users_edited_by_fk` (`edited_by`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS `myTable_history`;
CREATE TABLE IF NOT EXISTS `myTable_history` (
  `id` int(10) UNSIGNED NOT NULL COMMENT 'ID',
  `version` int(10) UNSIGNED NOT NULL,
  `title` varchar(32) NOT NULL,
  `description` varchar(2048) DEFAULT NULL,
  `edited_by` int(10) UNSIGNED DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `deleted_at` timestamp NULL DEFAULT NULL,
  `history_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`,`version`) USING BTREE,
  KEY `title` (`title`),
  KEY `history_users_edited_by_fk` (`edited_by`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `myTable`
  ADD CONSTRAINT `myTalbe_users_edited_by_fk` FOREIGN KEY (`edited_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE;
ALTER TABLE `myTable_history`
  ADD CONSTRAINT `history_users_edited_by_fk` FOREIGN KEY (`edited_by`) REFERENCES `users` (`id`);

-- Triggers

DROP TRIGGER IF EXISTS myTable_insert_history;
DROP TRIGGER IF EXISTS myTable_update_history;
DROP TRIGGER IF EXISTS myTable_delete_history;

DELIMITER //

CREATE TRIGGER myTable_insert_history AFTER INSERT ON myTable
FOR EACH ROW
BEGIN
  INSERT INTO myTable_history (
      `id`
    , `version`
    , `title`
    , `description`
    , `edited_by`
    , `created_at`
    , `updated_at`
  ) VALUES (
      NEW.id
    , 0
    , NEW.title
    , NEW.description
    , NEW.edited_by
    , NEW.created_at
    , NEW.updated_at
  );
END //

CREATE TRIGGER myTable_update_history AFTER UPDATE ON myTable
FOR EACH ROW
BEGIN
  INSERT INTO myTable_history (
      `id`
    , `version`
    , `title`
    , `description`
    , `edited_by`
    , `created_at`
    , `updated_at`
  )
  SELECT
      NEW.id
    , MAX(`version`) + 1
    , NEW.title
    , NEW.description
    , NEW.edited_by
    , NEW.created_at
    , NEW.updated_at
  FROM myTable_history
  WHERE id = OLD.id;
END //

CREATE TRIGGER myTable_delete_history AFTER DELETE ON myTable
FOR EACH ROW
BEGIN
  INSERT INTO myTable_history (
      `id`
    , `version`
    , `title`
    , `description`
    , `edited_by`
    , `created_at`
    , `updated_at`
    , `deleted_at`
  )
  SELECT
      OLD.id
    , MAX(`version`) + 1
    , OLD.title
    , OLD.description
    , OLD.edited_by
    , OLD.created_at
    , OLD.updated_at
    , NOW()
  FROM myTable_history
  WHERE id = OLD.id;
END //

DELIMITER ;

How to design a database with Revision History?

3 Answers3

Implement on MySQL

Table

Views

Get current version

Get all records (include deleted records)

Get records

Triggers & Validations

Usage examples

Select

Select with deleted records

Insert / Add / New

Update / Edit

Soft Delete

Restore Soft Deleted record

Delete record & history

Record history

Cons

Refs

Implement on MySQL

Linked