6

I am working on a project where I have to add a form of data versioning or history feature to every table. Basically we have to keep track of every insert or change in the database, so that it is easy to roll back to or view previous versions of the data in each table.

The way my project manager envisions this is to be done is to add a few new coloumns to each table. The main feature is a coloumn named "version". Every time an update is made, nothing is really updated, the old row remains but a new row is added to the table with an incremented value for "version".

To show the current data, we just use a view that shows only rows with the highest version number of each type.

While this works great as far as moving back and forth between different versions, I have run across a problem with this approach. For any relationship to exist between tables, we need to define foreign keys and foreign keys can only reference unique fields in the other table. Now that we are keeping several versions of the same row (with the same 'Id' as it is basically the same piece of data as far as our application is concerned) we can no longer use another table's 'Id' as a foreign key for a table.

We are using a unique primary key field for each row, but that is useless as an identifier since several rows are basically different versions of the same thing. We could manually keep track of the latest version of each kind of entry and update corresponding foreign key relationships every time something is changed, but that looks like a a lot of work and I am not sure that will always even work (e.g. reverting to previous version of an entry might cause the foreign key to reference an old and unusable version of another entry in some other table.)

I know there are other ways of keeping a history of database updates (e.g. by using a separate history table for each table), but I am stuck with this approach in this project. Is there some more obvious way of handling relationships between tables like this that I am missing?

Note: I am using MS SQL Server 2008 R2.

Marco Eckstein
  • 4,448
  • 4
  • 37
  • 48
MAK
  • 26,140
  • 11
  • 55
  • 86
  • Does this answer your question? [How to version control a record in a database](https://stackoverflow.com/questions/323065/how-to-version-control-a-record-in-a-database) – Marco Eckstein Nov 01 '20 at 16:51

1 Answers1

1

You say that not want a "separate revision table", not voting to FractalizeR's solution because this. Ok, here is a "one table solution"... But, please, simplify/generalise your question, for better answers and better use of this page for all visitors: I think your problem is about "revision control" on SQL tables.

The solution for "ISO 2008 SQL", then I think it is also for Microsoft SQL-Server. I tested it on PostgreSQL 9.1.

In this kind of problem we can use a SQL View to "emulate" the original table, and the "versioned table" as a new one, with more attributes: * An new attribute moment for sort (ordering) the revisions and for time registering; * An new attribute cmd for "traceability" (not really necessary).

Suppose your original (and conventional) table is t. For revision control you must add new attributes, but other programmers not need to see this new attributes... The solution is to rename the table t to t_hist and offer to other programmers a SQL VIEW t (as a query over t_hist).

t is a VIEW for show a conventional table: only "current tuples". t_hist is the new table, with the "history tuples".

Suppose t with attributes a,b. PS: on t_hist I added isTop for better performance on t.

 -- ....
 CREATE TABLE  t_hist (
    -- the old attributes for t:
    id integer NOT NULL, -- a primary key of t
    a varchar(10),  -- any attribute
    b integer,      -- any attribute

    -- new attributes for revision control:
    isTop BOOLEAN NOT NULL DEFAULT true, -- "last version" or "top" indicator
    cmd varchar(60) DEFAULT 'INSERT',    -- for traceability
    moment timestamp NOT NULL DEFAULT now(), -- for sort revisions
    UNIQUE(id,moment)
);

CREATE VIEW t AS
  SELECT id,a,b FROM t_hist WHERE isTop;
   -- same, but better performance, as 
   -- SELECT id,a,b FROM t_hist GROUP BY id,a,b HAVING MAX(moment)=moment  

-- Verifies consistency in INSERT:
CREATE FUNCTION t_hist_uniq_trig() RETURNS TRIGGER AS $$
DECLARE
  aux BOOLEAN;
BEGIN
   SELECT true INTO aux FROM t_hist 
   WHERE id=NEW.id AND moment>=NEW.moment;
   IF found THEN -- want removes from top?
     RAISE EXCEPTION 'TRYING TO INCLUDE (ID=%) PREVIOUS TO %', NEW.id, NEW.moment;
    END IF;
    RETURN NEW;
END  $$ LANGUAGE plpgsql;
CREATE TRIGGER uniq_trigs BEFORE INSERT ON t_hist 
    FOR EACH ROW EXECUTE PROCEDURE t_hist_uniq_trig();   

CREATE FUNCTION t_reset_top(integer)  RETURNS BOOLEAN AS $BODY$
    UPDATE t_hist SET isTop=false WHERE isTop=true AND id=$1
    RETURNING true;  -- null se nao encontrado
$BODY$ LANGUAGE sql;

--------
-- Implements INSER/UPDATE/DELETE over VIEW t, 
-- and controls unique id of t:
CREATE OR REPLACE FUNCTION t_cmd_trig() RETURNS TRIGGER AS $$
DECLARE
  aux BOOLEAN;
BEGIN
  aux:=true;
  IF TG_OP = 'DELETE' OR TG_OP = 'UPDATE' THEN
    aux := t_reset_top(OLD.id); -- rets. true ou NULL
  ELSE
    SELECT true INTO aux FROM t_hist WHERE id=NEW.id AND isTop;
  END IF;
  IF (TG_OP='INSERT' AND aux IS NULL) OR (TG_OP='UPDATE' AND aux) THEN
    INSERT INTO t_hist (id,a,b,cmd) VALUES (NEW.id, NEW.a,NEW.b,TG_OP);
  ELSEIF TG_OP='DELETE' AND aux THEN -- if first delete
    UPDATE t_hist SET cmd=cmd||' AND DELETE AT '||now()
  ELSEIF TG_OP='INSERT' THEN -- fails by not-unique(id)
    RAISE EXCEPTION 'REGISTER ID=% EXIST', NEW.id;
  ELSEIF TG_OP='UPDATE' THEN -- .. redundance, a trigger not goes here
    RAISE EXCEPTION 'REGISTER ID=% NOT EXIST', NEW.id;
  END IF;
  RETURN NEW; -- discarded
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER ins_trigs INSTEAD OF INSERT OR UPDATE OR DELETE ON t 
    FOR EACH ROW EXECUTE PROCEDURE t_cmd_trig();

--  Examples:
INSERT INTO t(id,a,b) VALUES (1,'aaaaaa',3); -- ok
INSERT INTO t(id,a,b) VALUES (1,'bbbbbb',3); -- error
UPDATE t_hist SET a='teste' WHERE id=1;      -- ok
     -- SELECT * from t;        SELECT * from t_hist;
INSERT INTO t(id,a,b) VALUES 
  (2,'bbbbbb',22), -- ok
  (3,'bbbbbb',22), -- ok
  (4,'aaaaaa',2);  -- ok
DELETE FROM t WHERE id=3;
     -- SELECT * from t;        SELECT * from t_hist;

PS: I suggest not to try adapt this solution for one table without a view, your trigger will be very complex; neither to try adapt for t_hist inheriting t, where all content inserted in t_hist will copy to t.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • I do not see anything that can be called "personal details" in my question. I believe the information there is the minimum needed to state my problem. Perhaps it was too big - but certainly had no "personal information". I appreciate your taking the time to answer, but as I stated in my question, the problem with the implementation you propose is how to represent foreign key relationships in such a scheme. Views from tables with extra information, I already know how to do. Anyways, this is more than a year old, and I went with something like FractalizeR's method. But thanks all the same. – MAK Apr 13 '12 at 08:43
  • Hello, I agree about the word "personal" and I removed it. My initial comments was to improve the site and your question rates. About use of VIEW, it is not another table, it is only a good facade for triggers and for external users. The solution is a "one table solution" (!). For non-"pure SQL" solutions, you can comment here if is good for you (as MS-SQL-server user) another answers like: http://stackoverflow.com/questions/9481557 or http://stackoverflow.com/questions/503472 – Peter Krauss Apr 13 '12 at 18:53