1

I am designing a MySQL db right now. One of the client requirement is maintaining record history of certain record tables. I referenced some articles in the internet which were suggesting me to maintain separate history tables but I did not like that idea. The I got one excellent idea in stack overflow Is there a MySQL option/feature to track history of changes to records? and made changes to my db. I went for the solution of maintaining record history on a same table with "valid_date_from" and "valid_date_to" flag rather than maintaining a separate history table.

For example I have two tables s_tbl_bill which has bill information and s_def_department which has the definition of deparment. Two tables as related using the key billing_department in s_tbl_bill.

CREATE TABLE `s_tbl_bill` (
  `id` int NOT NULL AUTO_INCREMENT,
  `billing_department` int,
  `customer_id` mediumtext NOT NULL,
  `billed_date` datetime DEFAULT NULL,
  `is_active` enum('Y','N') DEFAULT 'Y',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `s_def_department` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name_eng` varchar(100) NOT NULL,
  `parent_id` int DEFAULT NULL,
  `phone` varchar(50) DEFAULT NULL,
  `is_active` varchar(50) DEFAULT 'Y',
  `created_timestamp` datetime DEFAULT CURRENT_TIMESTAMP,
  `valid_from` datetime DEFAULT CURRENT_TIMESTAMP,
  `valid_until` datetime DEFAULT NULL,
  `author_id` int DEFAULT NULL,
  `id_first` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Now the problem I was having is of primary key which is auto incremented. As I modify the record old record is set to inactive and a new record is added with the a new primary key and I was joining the records in queries using primary keys. As the new record is the new version of the same old tuple new primary key was giving me problem while joining. To fix this problem i added another field in the table "id_first" which houses the primary key of the record when that new record was created for the first time.

For a scenario,

INSERT INTO `s_tbl_bill` (`id`, `billing_department`, `customer_id`, `billed_date`, `is_active`) 
VALUES ('10', '2', '5', '2018-06-19 13:00:00', 'Y');

INSERT INTO `s_def_department` (`id`, `name_eng`, `phone`, `is_active`, `created_timestamp`, `valid_from`, `valid_until`, `id_first`) 
VALUES ('2', 'DVD Store', '014231232', 'N', '2018-01-01', '2018-01-01 ', '2019-01-01', '2');

INSERT INTO `s_def_department` (`id`,`name_eng`, `phone`, `is_active`, `created_timestamp`, `valid_from`, `id_first`) 
VALUES ('14','Video Store', '012321223', 'Y', '2019-01-02', '2019-01-2', '2');

I have bill of id 10 which was printed on 2018-06-19. Now on today's date an audit is going on and wants to find out from which department bill 10 is printed. But the Department printing the bill have its name changed from DVD Store to Video Store. To find out I run the following query.

select name_eng as dept_name
from s_tbl_bill b join s_def_department d on b.billing_department = d.id_first
where b.id = '10' and d.valid_from <= b.billed_date and d.valid_until >= b.billed_date

Is there any points to improve on my approach? Any suggestions will be highly valuable.

  • 2
    I would recommend a history table. It leaves the production table untouched and avoids a raft of complications. The data can be stored in JSON avoiding having to recreate the table structure. Consider something like [paper-trail](https://github.com/paper-trail-gem/paper_trail). – Schwern Sep 19 '20 at 06:05
  • 'maintain separate history tables but I did not like that idea.' I go the opposite way - but really the decision is driven by your db what it does an crucially how frequently you need history records - ie you will get opinions here but up to you which route to go down. – P.Salmon Sep 19 '20 at 06:20

2 Answers2

1

In the linked question, a comment mentions:

The combination of customer_id and the dates are the primary key.

So your s_tbl_bill.id should not change.
Also, you don't need to save the first_id, because you can calculate this easily.

s_def_department` (

  // ...

  PRIMARY KEY (`id`, `valid_from`)
}

INSERT INTO `s_def_department` (`id`,`name_eng`, `phone`, `is_active`, `created_timestamp`, `valid_from`) 
VALUES ('2','Video Store', '012321223', 'Y', '2019-01-02', '2019-01-2');

select (
  select name_eng 
  from s_def_department d
  where b.billing_department=d.id 
  order by valid_from desc 
  limit 1
) as dept_name
from s_tbl_bill b

# if you want only 1 record
where b.id = 10

So what we do is, we select the fields of another table, with the corresponding ID. To take the newest, we use order by valid_from desc limit 1.

So if you want the first entry, you can just use order by valid_from asc limit 1.

Ron van der Heijden
  • 14,803
  • 7
  • 58
  • 82
1

Consider using table partitioning on is_active. Since most queries will need where is_active = 'Y' this will avoid some indexing and performance issues by placing only active rows one table. If you also partition by valid_until you can keep the inactive partition size under control, and allow efficient truncation of old history by simply dropping partitions.

Because nearly all queries on and joins with this table will require is_active = 'Y' strongly consider using an ORM which can apply this scope consistently.

One big performance issue and complication is that instead of a single update to write the change, multiple queries must happen. These need to be in a transaction to avoid race conditions. For example, let's say you want to update id 42 and id_first 23.

begin

-- copy yourself
insert into s_def_department
select * from s_def_department where id = 42 and is_active = 'Y';

-- apply the changes to the new active row and set its tracking columns
update s_def_department
set
  name_eng = 'Something Else',
  valid_until = NULL,
  valid_from = CURRENT_TIMESTAMP
where id = last_insert_id();

-- deactivate yourself
update s_def_department
set is_active = 'N', valid_until = CURRENT_TIMESTAMP
where id = 42;

commit

EDIT An alternative approach is to use two tables. One to store the ID of the item, and one to hold the data.

create table s_def_department_ptr (
  id bigint primary key auto_increment,
  data_id bigint not null references s_def_department_data(id)
);

CREATE TABLE `s_def_department_data` (
  `id` bigint not null primary key auto_increment,
  `ptr_id` bigint not null references s_def_department_ptr(id),
  ... and the rest of the data rows plus valid_from and valid_until ...
);

When the data is changed, a row is added to s_def_department_data and s_def_department_ptr.data_id is changed to refer to it.

This eliminates the need for is_active, the active row is the one pointed at by data_id, avoiding goofs by leaving is_active off a query and improving referential integrity.

It also simplifies keys and improves referential integrity. Tables reference s_def_department_ptr.id.

The downside is it adds a join to every query. And what should be a simple update still requires several queries.


Both approaches add many wide-spread performance penalties and production complications for a feature that will likely only be used in a handful of places. I would recommend a history table. It leaves the production table and code untouched. The data can be stored in JSON avoiding having to recreate the table structure. Consider something like paper-trail.

Schwern
  • 153,029
  • 25
  • 195
  • 336