Suggested techniques for storing multiple versions of SQL row data

Question

I am developing an application that is required to store previous versions of database table rows to maintain a history of changes. I am recording the history in the same table but need the most current data to be accessible by a unique identifier that doesn't change with new versions. I have a few ideas on how this could be done and was just looking for some ideas on the best way of doing this or whether there is any reason not to use one of my ideas:

Create a new row for each row version, with a field to indicate which row was the current row. The drawback of this is that the new version has a different primary key and any references to the old version will not return the current version.
When data is updated, the old row version is duplicated to a new row, and the new version replaces the old row. The current row can be accessed by the same primary key.
Add a second table with only a primary key, add a column to the other table which is foreign key to new table's primary key. Use same method as described in option 1 for storing multiple versions and create a view which finds the current version by using the new table's primary key.

Of course that will depend on the usage of the whole history, but have you considered 'backup' table - the table that you will move the records to... and will have your main table only with current version of each row. — Daniel Mošmondor, Dec 16 '11 at 02:17
Any particular reason to record the history in the same table? — UnhandledExcepSean, Dec 16 '11 at 02:23
I want to keep the history in the same table as history can be viewed at any time, using the same code that display's the current data, and history can be edited and brought back to current at any time too. — johna, Dec 16 '11 at 02:49

score 2 · Answer 1 · answered Dec 16 '11 at 03:45

PeopleSoft uses (used?) "effective dated records". It took a little while to get the hang of it, but it served its purpose. The business key is always extended by an EFFDT column (effective date). So if you had a table EMPLOYEE[EMPLOYEE_ID, SALARY] it would become EMPLOYEE[EMPLOYEE_ID, EFFDT, SALARY].

To retrieve the employee's salary:

SELECT e.salary
  FROM employee e
  WHERE employee_id = :x
    AND effdt = (SELECT MAX(effdt)
                   FROM employee
                   WHERE employee_id = :x
                     AND effdt <= SYSDATE)

An interesting application was future dating records: you could give every employee a 10% increase effective Jan 1 next year, and pre-poulate the table a few months beforehand. When SYSDATE crosses Jan 1, the new salary would come into effect. Also, it was good for running historical reports. Instead of using SYSDATE, you plug in a date from the past in order to see the salaries (or exchange rates or whatever) as they would have been reported if run at that time in the past.

In this case, records are never updated or deleted, you just keep adding records with new effective dates. Makes for more verbose queries, but it works and starts becoming (dare I say) normal. There are lots of pages on this, for example: http://peoplesoft.wikidot.com/effective-dates-sequence-status

score 1 · Accepted Answer · edited Dec 16 '11 at 03:11

1

#3 is probably best, but if you wanted to keep the data in one table, I suppose you could add a datetime column that has a now() value populated for each new row and then you could at least sort by date desc limit 1.

Overall though - multiple versions needs more info on what you want to do effectively as much as programatically...ie need more info on what you want to do.

R

edited Dec 16 '11 at 03:11

MPelletier

16,256
15
86
137

answered Dec 16 '11 at 02:20

Ross

1,639
1
18
22

I didn't provide too much detail on my requirements as I was wondering if there was any methods I had overlooked, and just looking for general non-language or environment specific techniques. – johna Dec 16 '11 at 02:52
Ultimately I have gone with option #3. Thank you. – johna Jan 09 '12 at 01:15

score 0 · Answer 3 · answered Dec 16 '11 at 02:17

0

Have you considered using AutoAudit?

AutoAudit is a SQL Server (2005, 2008) Code-Gen utility that creates Audit Trail Triggers with:

Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing INT) columns to table

Insert event logged to Audit table

Updates old and new values logged to Audit table

Delete logs all final values to the Audit tbale

view to reconstruct deleted rows

UDF to reconstruct Row History

Schema Audit Trigger to track schema changes

Re-code-gens triggers when Alter Table changes the table

answered Dec 16 '11 at 02:17

Mitch Wheat

295,962
43
465
541

@Ross: it is but the poster didn't say what RDBMS they were using, and of couse I assumed SQL Server! ;) Generally, you use one type of DB, so asking for agnostic solutions is not particularly useful. – Mitch Wheat Dec 16 '11 at 02:22
I was wanting to manage the history in my own code as I need to compare and report on changes at times. DB is SQL Server 2008. – johna Dec 16 '11 at 02:50
@John: you can do that: open the relevant history view. – Mitch Wheat Dec 16 '11 at 03:39

score 0 · Answer 4 · answered Dec 16 '11 at 02:41

For me, history tables are always separate. So, definitely I would go with that, but why create some complex versioning thing where you need to look at the current production record. In reporting, this results in nasty unions that are really unnecessary.

Table has a primary key and who cares what else.
TableHist has these columns: incrementing int/bigint primary key, history written date/time, history written by, record type (I, U, D for insert, update, delete), the PK from Table as an FK on TableHist, the remaining columns all other columns with the same name are in the TableHist table.

If you create this history table structure and populate it via triggers on Table, you will have all versions of every row in the tables you care about and can easily determine the original record, every change, and the deletion records as well. AND if you are reporting, you only need to use your historical tables to get all of the information you'd like.

I agree that a separate table is preferable in many situations but for various reasons (some outlined in my comment to my question above) I prefer one table for my situation. — johna, Dec 16 '11 at 02:51
Bring the data back at any time would be a simple update against your table using the history table's value. I see what you are thinking, but I think that will be more difficult to deal with. — UnhandledExcepSean, Dec 16 '11 at 02:55

score 0 · Answer 5 · answered Dec 16 '11 at 03:02

create table table1 (
    Id int identity(1,1) primary key,
    [Key] varchar(max),
    Data varchar(max)
)
go

create view view1 as
with q as (
    select [Key], Data, row_number() over (partition by [Key] order by Id desc) as 'r'
    from table1
)
select [Key], Data from q where r=1
go

create trigger trigger1 on view1 instead of update, insert as begin
    insert into table1 
    select [Key], Data 
    from (select distinct [Key], Data from inserted) a
end
go

insert into view1 values
     ('key1', 'foo')
    ,('key1', 'bar')

select * from view1

update view1
set Data='updated'
where [Key]='key1'

select * from view1

select * from table1

drop trigger trigger1
drop table table1
drop view view1

Results:

Key Data
key1    foo

Key Data
key1    updated

Id  Key Data
1   key1    bar
2   key1    foo
3   key1    updated

I'm not sure if the disctinct is needed.

Suggested techniques for storing multiple versions of SQL row data

5 Answers5