Historical / auditable database

Question

This question is related to the schema that can be found in one of my other questions here. Basically in my database I store users, locations, sensors amongst other things. All of these things are editable in the system by users, and deletable.

However - when an item is edited or deleted I need to store the old data; I need to be able to see what the data was before the change.

There are also non-editable items in the database, such as "readings". They are more of a log really. Readings are logged against sensors, because its the reading for a particular sensor.

If I generate a report of readings, I need to be able to see what the attributes for a location or sensor was at the time of the reading.

Basically I should be able to reconstruct the data for any point in time.

Now, I've done this before and got it working well by adding the following columns to each editable table:

valid_from
valid_to
edited_by

If valid_to = 9999-12-31 23:59:59 then that's the current record. If valid_to equals valid_from, then the record is deleted.

However, I was never happy with the triggers I needed to use to enforce foreign key consistency.

I can possibly avoid triggers by using the extension to the "PostgreSQL" database. This provides a column type called "period" which allows you to store a period of time between two dates, and then allows you to do CHECK constraints to prevent overlapping periods. That might be an answer.

I am wondering though if there is another way.

I've seen people mention using special historical tables, but I don't really like the thought of maintainling 2 tables for almost every 1 table (though it still might be a possibility).

Maybe I could cut down my initial implementation to not bother checking the consistency of records that aren't "current" - i.e. only bother to check constraints on records where the valid_to is 9999-12-31 23:59:59. Afterall, the people who use historical tables do not seem to have constraint checks on those tables (for the same reason, you'd need triggers).

Does anyone have any thoughts about this?

PS - the title also mentions auditable database. In the previous system I mentioned, there is always the edited_by field. This allowed all changes to be tracked so we could always see who changed a record. Not sure how much difference that might make.

Thanks.

score 33 · Accepted Answer · edited May 23 '17 at 12:09

Revised 01 Jan 11

Ok, so there is a gap between where I sit (deliver fully auditable databases; yours being a particular requirement of that) and where you sit: based on your questions and comments. Which we will probably work out in the commentary. Here's a position to start from.

To provide this requirement, there is no need at all for: triggers; mass duplication; broken integrity; etc.
This is not a classic Temporal requirement, either, so no need for the "period" capability, but you can.
ValidFrom and ValidTo is a Normalisation error: the ValidTo is data that is easily derived; ValidTo in any row is duplicated, in the ValidFrom of the next row; you have an Update Anomaly (when you update one column in one row, you additionally have to update the other column in the next row); you have to use a dummy value for "current".
- All unnecessary, use ValidFrom only, and keep the db clean and pure 5NF.
- The Caveat is, if PostgreSQL can't perform Subqueries without falling in a heap (ala Oracle), then fine, kep ValidTo.

All of these things are editable in the system by users, and deletable.

Well, no. It is a database holding important information; with Referential Integrity, not a scratchpad, so the user cannot just walk up to it and "delete" something. It will contradict the same users requirement for maintaining historical data (in the Reading; Alert; Ack; Action; Download).

Cascading deletes are not allowed. Those functions are check boxes for non-databases, MS Access types. For real databases, the RI constraints stop parents with children from being deleted.
Primary Keys cannot (should not) be changed. Eg. UserId; LocationId; NetworkSlaveCode never change; remember, they are carefully considered Identifiers. One characteristic of PKs is that they are stable.
You can add new Users; you can change a current User's name; but you cannot delete an User who has entries in Download, Acknowledgement, Action.

Basically if it's editable then it has to be historical (so that excludes readings and alerts).

Also excludes: Downloads; Acknowledgements; Actions.

And the Reference tables: SensorType; AlertType; ActionType.

And the new History tables: they are inserted into, but they cannot be updated or deleted.

The problem I find with the isObselete flag is.. Say if you change the Location, the Sensor foreign key will now point to an obselete record, meaning you will have to duplicate every sensor record. This problem gets exponentially worse as the hierachy gets bigger.

Ok, so now do you understand the LocationId (FK) in Sensor will not change; there is no mass duplication, etc ? There is no problem in the first place (and there is in that stupid book!) that gets exponentially worse in the second place.
~~IsObsolete is inadequate for your requirement.~~ (Refer below)
The UpdatedDtm in any real row (Reading, etc) identifies the Parent (FK to Sensor) History row (its AuditedDtm) that was in effect at the time.
Full Relational capability; Declarative Refential Integrity, etc.
Maintain the IDEF1X, Relational concept of strong Identifiers ... There is only one Current parent row (eg. Location)
The rows in the History are Images of the current row, before it was changed, at the stated AuditedDtm. The Current row (non-history) shows the one last UpdatedDtm, when the row was changed.
The AuditedDtm shows the entire series of UpdatedDtms for any given key; and thus I have used it to "partition" the real key in a temporal sense.

All that is required is a History table for each changeable table. I have provided the Hiistory tables for four Identifying tables: Location; Sensor; NetworkSlave; and User.

Please read this for understanding Auditable in the accounting sense.

Data Model

Link to Sensor Data Model with History (Page 2 contains the History tables and context).

Readers who are not familiar with the Relational Modelling Standard may find IDEF1X Notation useful.

Response to Comments

(1) My first issue is that of referential integrity with the historic data, in that I'm not sure there is any, and if there is I'm not sure how it works. For instance, in SensoryHistory it would be possible to add a record that had an UpdatedDtm indicating a date time before the location itself existed, if you see what I mean. Whether this is actually an issue I'm not sure - enforcing that might be over the top.

(You raised a similar issue in the other question.) It may be that the dbs you have experienced did not actually have the Referential Integrity in place; that the Relation lines were there just for documentation; that the RI was "implemented in app code" (which means there is no RI).

This is an ISO/IEC/ANSI Standard SQL database. That allows Declarative Referential Integrity. Every Relation line is implemented as a PK::FK Reference, an actual Constraint that is Declared. Eg:

CREATE TABLE Location
    ...
    CONSTRAINT UC_PK
        PRIMARY KEY (LocationId)
    ...
CREATE TABLE Sensor
    ...
    CONSTRAINT UC_PK
        PRIMARY KEY (LocationId, SensorNo)
    CONSTRAINT Location_Sensor_fk
        FOREIGN KEY (LocationId)
        REEFERENCES Location(LocationId)
    ...
CREATE TABLE SensorHistory
    ...
    CONSTRAINT UC_PK
        PRIMARY KEY (LocationId, SensorNo, UpdatedDtm))
    CONSTRAINT Sensor_SensorHistory_fk
        FOREIGN KEY (LocationId, SensorNo)
        REEFERENCES Sensor (LocationId, SensorNo)
    ...

Those Declared Constraints are enforced by the server; not via triggers; not in app code. That means:

A Sensor with a LocationId that does not exist in Location cannot be inserted
A LocationId in Location that has rows in Sensor cannot be deleted
A SensorHistory with a LocationId+SensorNo that does not exist in Sensor cannot be inserted
A LocationId+SensorNo in Sensor that has rows in SensorHistory cannot be deleted.

(1.1) All columns should have RULEs and CHECK Constraints to Constrain their range of values. That in addition to the fact that all INSERT/UPDATE/DELETEs are programmatic, within stored procs, therefore accidents do not happen, and people do not walk up to the database and run commands against it (excepts SELECTS).

Generally I stay away from triggers. If you are using stored procs, and the normal permissions, then this:

in SensoryHistory it would be possible to add a record that had an UpdatedDtm indicating a date time before the Location itself existed, if you see what I mean

is prevented. So is inserting a SensorHistory with an UpdatedDtm earlier than the Sensor itself. But procs are not Declarative Rules. However if you want to be doubly sure (and I mean doubly, because the INSERTS are all via a proc, direct command by users), then sure, you have to use a trigger. For me, that is over the top.

(2) how do I indicate deletion? I could just add a flag to the non-historical version of the table I guess.

Not sure yet. Eg. Do you accept that when a Sensor is deleted, it is final ... (yes, history is maintained) ... and then when a new Sensor is added to the Location, it will have a new SensorNo ... there is no Sensor being logically replaced with the new one, with or without a gap in time ?

From a end-user's point of view, via the software they should be able to add, edit and delete sensors at will with no limitation. But yes, once deleted it is deleted and cannot be undeleted. There's nothing to stop them re-adding a sensor later though with the exact same parameters.

And "delete" Locations, NetworkSlaves, and Users as well.

Ok. Then the new Sensor with the same parameters, is truly new, it has a new SensorNo, and is independent of any previous logical Sensor. We can add an IsObsolete BOOLEAN to the four identifying tables; it is now identified as adequate. The Delete is now a Soft Delete.

(2.1) For NetworkSensor and LoggerSensor, which are actually dependent on two parents: they are obsolete if either of their parents are obsolete. So there is no point giving them an IsObsolete column, which has a dual meaning, which can be derived from the applicable parent.

(2.2) Just to be clear, users cannot delete any rows from any Transaction and History tables, right?

(3) When updating a table, what method would be best to insert the new row in the historical table and update the main table? Just normal SQL statements inside a transaction maybe?

Yes. That is the classic use of a Transaction, as per ACID Properties, it is Atomic; it either succeeds in toto or fails in toto (to be retried later when the problem is fixed).

(4) Referenced Book

The definitive and seminal text is Temporal Data and the Relational Model C J Date, H Darwen, N A Lorentzos. As in, those of us who embrace the RM are familiar with the extensions, and what is required in the successor to the RM; rather than some other method.

The referenced book is horrible, and free. The PDF isn't a PDF (no search; no indexing). Opening my MS and Oracle is telling; a few good bits couched in lots of fluff. Many misrepresentations. Not worth responding to in detail (if you want a proper review, open a new question).

(4.1) ValidTo in addition to ValidFrom. Serious mistake (as identified at the top of my answer) which the book makes; then laboriously solves. Don't make the mistake in the first place, and you have nothing to solve in the second place. As I understand it, that will eliminate your triggers.

(4.2) Simple rules, taking both Normalisation and Temporal requirements into account. First and foremost, you need to deeply understand (a) the temporal requirement and (b) the DataTypes, correct usage and limitations. Always store:

Instant as DATETIME, eg. UpdatedDtm
Interval as INTEGER, clearly identifying the Unit in the column name, eg. IntervalSec
Period. Depends on conjunct or disjunct.
- For conjunct, which this requirement is, (4.1) applies: use one DATETIME; the end of the period can be derived from the beginning of the period of the next row.
- For disjunct periods, yes, you need 2 x DATETIMEs, eg, RentedFrom and a RentedTo with gaps in-between.

(4.3) They mess with the "Temporal Primary Key", which complicates code (in addition to requiring triggers to control the Update Anomaly). I have already delivered a clean (tried and tested) Temporal Primary Key.

(4.4) They mess with dummy values, non-real values, and Nulls for "Now". I do not allow such things in a database. Since I am not storing the duplicated ValidTo, I do not have the problem, there is nothing to solve.

(4.5) One has to wonder why a 528 page "textbook" is available free on the web, in poor PDF form.

(5) I [an User] could quiet happily delete all the LocationHistory rows for instance, (leaving only the current version in the Location table) - even though there may exist a SensorHistory row that conceptually "belongs" to a previous version of the Location, if that makes sense.

It does not make sense to me, there is still a gap in the communication we have to close. Please keep interacting until it is closed.

In a real (standard ISO/IEC/ANSI SQL) database, we do not GRANT INSERT/UPDATE/DELETE permission to users. We GRANT SELECT and REFERENCES only (to chosen users) All INSERT/UPDATE/DELETEs are coded in Transactions, which means stored procs. Then we GRANT EXEC on each stored proc to selected users (use ROLES to reduce administration).
- Therefore no one can delete from any table without executing a proc.
- Do not write a proc to delete from any History table. These rows should not be deleted. In this case, the non-permission and the non-existence of code is the Constraint.
- Technically, all History rows are valid, there is no Period to concern yourself with. The oldest LocationHistory row contains the before-image of the original Location row before it was changed. The youngest LocationHistory rows is the before-image of the current Location row. Every LocationHistory row in-between is thusly valid and applies to the Period in-between.
- No need to "prune" or look for a few LocationHistory rows that can be deleted on the basis that they apply to a Period that is not used: they are all used. (Definitively, without the need for checking for any mapping of Location children to any LocationHistory row(s), to prove it.)
- Bottom line: an User cannot delete from any History (or Transaction) table.
- Or do you mean something different again ?
- Note I have added (1.1) above.

(6) Corrected one mistake in the DM. An Alert is an expression of Reading, not Sensor.

(7) Corrected the Business Rules in the other question/answer to reflect that; and the new rules exposed in this question.

(8) Do you understand/appreciate, that since we have a fully IDEF1X compliant model, re Identifiers:

The Identifiers are carried through the entire database, retaining their power. Eg. when listing Acknowledgements, they can be joined directly with Location and Sensor; the tables in-between do not have to be read (and they must be if Id keys are used). This is why there are in facts less joins required in a Relational Database (and more joins required in a unnormalised one).
the Subtypes, etc need to be navigated only when that particular context is relevant.

@Mark ("cancelled" was my descriptive term). It is your system; you need to tell me. Depending on your perspective, it could be "cancelled" by (a) Acknowledgement (b) any Action (Ack being the 1st Action) (c) a specific Action (d) a good reading. In some other context (eg. Sensor dead, no further Readings), the condition is permanent and cannot be "cancelled". Unless you identify a new requirement, there is no need for another table. The **database** is only the facts; the **treatment** is derived, temporal. The simplicity, and the completeness at the same time, may be unnerving you. — PerformanceDBA, Jan 10 '11 at 21:52
@PerformanceDBA: From the user's perspective, the alert is either active or inactive. The alert becomes inactive if a good reading occurs. If another bad reading occurs, a new alert is then create. Whether an alert is active or not, it still needs the usual acknowledgement and action "applied" before it can be considered "delt with". So the states are something like: active, active&acknowledged, active&actioned, inactive, inactive&acknowledged, inactive&actioned. (continued in next comment..) — Mark, Jan 11 '11 at 09:32
@PerformanceDBA: So - if I add a table that states that an alert is no longer active, I can then easily query for alerts that are active, or inactive. Having said that, I can derive this from the readings table so you could argue that it is duplicate data? — Mark, Jan 11 '11 at 09:34
@PerformanceDBA: I'm actually starting to wonder if there's any point relating the alert to the first reading that kicked off the alert. The most relevant peices of information for an alert is 1) what time it started and 2) the most out of range reading. Maybe the alert should be related to the sensor instead. The advantage with relating it to the reading though I suppose is it ensures that a reading exists in the database "for" the alert. I need to give this some more thought! — Mark, Jan 11 '11 at 11:20
@PerformanceDBA: Maybe it should be: alert table relates to sensor, and then there's an alert_reading table that relates each reading for the sensor while it's in *that* alert to the alert - if you see what I mean.. So yes this does create a 1:many, but it is true to say all of those readings are "in" that alert. — Mark, Jan 11 '11 at 11:28
@PerformanceDBA: Actually the alert could still relate to the first reading - that ensures there is always at least 1 reading for an alert.. Sorry thinking out loud a lot here! — Mark, Jan 11 '11 at 11:35
@PerformanceDBA: Here is what I mean: http://www.alienmuppet.com/~marks/Diagrams/AlertRelations1.png (ps meant to use SensorNo, not SensorId). — Mark, Jan 11 '11 at 11:46
@Mark. 1) You are making sense, but it is getting more complex than it needs to be. An Alert is **not** a child of Sensor (if it were, that would allow Alerts when the Sensor is within range). An Alert **is** a child of Reading, specifically one that is outside the range. The Readings continue to be logged on one path; the Ack & Action get logged down another path; each are independent. — PerformanceDBA, Jan 11 '11 at 13:09
It also happens that the Reading may continue to be (temporally) in "alert" after the Alert is logged, so the Alert can be said to **apply to** many Readings; but you do not need a separate table or another relation for that, it already exists via the common parent, Sensor, and is easily derived (AlertReading is a 100% duplicate, will have Update Anomalies)). An Alert does not **have** one or more Readings; A Reading **may have** an Alert. 2) circular relations are a sure sign of an unresolved model, which will lead to coding problems, guaranteed. I resolve them. — PerformanceDBA, Jan 11 '11 at 13:14
@PerformanceDBA: Are you saying then that if a reading indicates that the sensor is too warm for instance, it won't have an alert because it's already been generated by a previous reading? — Mark, Jan 11 '11 at 13:27
@PerformanceDBA: To be more clear: say if I have a reading come in that indicates a particular sensor is too warm, and an alarm is generated.. and then there are 10 subsequent readings all with the same alarm state (too warm), that only the first of those readings would "have" an alarm? — Mark, Jan 11 '11 at 15:52
@PerformanceDBA: Here's an example scenario: http://www.alienmuppet.com/~marks/PDF/AlertExample.pdf - hope it makes sense. I'm hoping I can write a query that will return each active (or inactive) alert along with the highest value reading that occurred whilst that alert is active - for example, during a "too warm" period I'll need to get the highest temperature that occurred. In the attached pdf for instance, the alert that occurred at 10:00 has a peak temperature of 11 (I'm using temperature for the sake of argument). The peak temperature is the important bit. — Mark, Jan 11 '11 at 16:31
@PerformanceDBA: I think it's dawning on me why I should not have AlertReading :-) Sorry I'm a bit slow sometimes! — Mark, Jan 11 '11 at 16:33
@Mark. Relax, you are thorough, not slow. Never Duplicate Anything. Yes, 1 Alert, for the first of any series of out-of-range readings. Now we are moving into app code, not database. It has to handle the hysteresis, and not generate duplicate alarms for the same condition. The example **data** is perfect. For the Alert **report**, add *derived* columns: PeakValue & NoOfOccs, or Duration, or whatever. — PerformanceDBA, Jan 11 '11 at 21:34
@Mark. New question. Which is correct: a Sensor has a Value or a Switch value ...or... a Sensor has a Value and a Switch value ? — PerformanceDBA, Jan 11 '11 at 21:38
@PerformanceDBA: Thanks, I now understand. To answer your question: A sensor has a value *or* a switch value - it's configured to be one or the other, it can never be both. — Mark, Jan 11 '11 at 21:48
@Mark. 1) Since the History issues are now closed, I have finalised that page. 2) The discussion re Alerts clarified the Verb Phrase. 3) The Switch XOR Value requires a change to the DM, which no longer fits onto a page. 4) I have therefore arranged them in Subject Areas, and provided an ERD for a complete overview & for navigation (do try that out). — PerformanceDBA, Jan 12 '11 at 03:04
@PerformanceDBA: Thanks! I'll give it a good look over. I'm still battling a little with fetching the alerts with their peak reading value - it'd be easy if it was only a report, but there is a status screen on the software that shows all current alerts and their peak temperatures (it updates every time a new live reading comes in with a greater peak). Keeping them up to date is fine, it's just the initial query as I need to find the maximum value of all the readings that came in during the alert, and thats for each alert for each sensor. I'll work on it and let you know how it goes! — Mark, Jan 12 '11 at 11:12
@PerformanceDBA: Nope definitely strugging with trying to find out facts about alerts from readings without physically loading all the readings and processing them in software.I will keep trying.To be clear,I'm trying to write a query that will fetch all alerts that are not "actioned" or still "active"-and one that will fetch all alerts since a certain date but with it's peak value. In fact, when viewing an alert in any way on the software it's always with it's peak value. I could determine all this really easily using two extra tables: AlertReading and AlertEnd, but that feels like cheating! — Mark, Jan 12 '11 at 17:03
@Mark. 1) If you introduce any duplication of data, it is much worse than "cheating", you break the rules of Normalisation and there are consequences at every turn, Update Anomalies; slower processing; etc. 2) All you need is a simple Subquery, which is very fast, if you have difficulty, ask a new question. 3) `Reading.IsSwitch` should be `Sensor.IsSwitch` (I am working within the limits of SQL but sparing you the details). Data Model corrected. — PerformanceDBA, Jan 13 '11 at 03:28
@PerformanceDBA: I just want to check something - When I update a record, for instance "Location", do I set the AuditedDTM in LocationHistory to the current time? And UpdatedDTM in LocationHistory stays as it is in the record being copied? EG: If I have a record in Location: [ID=10, UpdatedDTM=10:00:00], and now I update it: COPY-TO to LocationHistory: [ID=10, UpdatedDTM=10:00:00, AuditedDTM=10:05:00], And now UpdatedDTM in Location is [ID=10 UpdatedDTM=10:05:00]? Hope that makes sense! Hard to illustrate properly in comments! — Mark, Feb 03 '11 at 10:08
@Mark. Yes. And UpdatedDtm in *Location* is SET to current datetime. LocationHistory is a pure **before-image** of Location before the Update, including its UpdatedDtm unchanged. That is consistent with the Temporal requirement. — PerformanceDBA, Feb 03 '11 at 12:57
@PerformanceDBA: Great, that's how I've implemented it so far and all seems fine. The logger table may have some extra attributes, such as "MemoryFullPolicyCode" - this can be changed by the user so I'm thinking I need to add a history table for it too along with UpdatedDTM, UpdatedUserId, etc. even though it's a 1:1 with Location.. — Mark, Feb 03 '11 at 13:14
@PerformanceDBA: For unique columns on tables that have the is_obselete flag, I'm using partial indexes (conditional on is_obselete='f') - does that sound right? I think that's the only way I can do it in PostgreSQL. And on the actual history tables, I'm just not worrying about unique columns... unless I should be..! — Mark, Feb 03 '11 at 15:41
@PerformanceDBA: Another quicky.. the sub-types - is this how you'd go about implementing them http://consultingblogs.emc.com/davidportas/archive/2007/01/08/Distributed-Keys-and-Disjoint-Subtypes.aspx - and do you consider this method "sound"? It feels like a bit of a hack, but I'm not sure there's any way round it other than repeating tables over and over for each type. For example, without this or a similar method, I'd have to repeat all the alert tables for both the ReadingSwitch and ReadingValue tables. — Mark, Feb 03 '11 at 17:05
@Mark. is_obsolete: No. One PRIMARY KEY Index for the columns above the line plus one UNIQUE Index for each Alternate Key "AK". No partial indices. If location_name "Buckingham Palace" has been soft-deleted, the name cannot be re-used. If you want re-use, (think about the effect in both children and History), reverse the is_obsolete. — PerformanceDBA, Feb 03 '11 at 21:46
@PerformanceDBA: If I were to "enable" the ability to undo soft deletes, I would need to also have an "is_obselete" column in the history table I think? So that I could show the fact it was obselete in the history? That aside, the uniqueness I mentioned is for an additional field that is required for Locations callias "Alias". It is meant to be a (user defined) shortened version of the name (max 17 chars) that is used when the name has to be displayed on hardware that cannot display the long version (such as on a handreader). Continued in next comment... — Mark, Feb 03 '11 at 22:44
@PerformanceDBA: What worries me is if a user (soft)deletes a Location, and then later creates a new Location with a different name, but with an alias that exists on a soft-deleted Location, then there'll be a problem... Sorry I only found out about the alias requirement recently! — Mark, Feb 03 '11 at 22:46
@PerformanceDBA: I did just think maybe if any of the "alternate keys" matched a deleted item, then the item could be undeleted.. but what if the LocationName (an alternate key) matches one soft-deleted record, and the LocationAlias (an alternate key) matches another soft-deleted record? Or have I gone mad :-) — Mark, Feb 03 '11 at 23:35
@Mark. the link: it provides a method but does not explain either the problem or the solution properly, and the solution is massive duplication. Not sound; implementing RI for Subtypes is much easier than in the link, but a short explanation is required. Please open a new question "How to Implement Referential Integrity in Subtytpes" and I can answer it fully. Include the link if you like. — PerformanceDBA, Feb 04 '11 at 07:23
@Mark. 1) Alias (I would prefer ShortName) is a separate AK to Name. They would have to change the Name or Alias of any existing value before they can use it, is_obsolete or not. There is a way around that but it is a little tedious and cannot be identified in comments. Let me know if you need it. 2) Yes, you need is_obsolete in the History if you want to un-delete soft deletes — PerformanceDBA, Feb 04 '11 at 07:23
Mark. In the meantime, I do not understand why, without **a** method, you would "have to repeat all the alert tables for both the ReadingSwitch and ReadingValue tables"; please identify. Have I not explained eht History & Subtypes adequately ? — PerformanceDBA, Feb 04 '11 at 07:25
@PerformanceDBA: Sorry yet another couple of questions...! First one is regarding unique constraints again - should I carry them over to the history tables, adding the UpdatedDTM column as part of the constraint? Or would that be overkill? For example: CONSTRAINT location_unique_label (Label, UpdatedDtm). — Mark, Feb 04 '11 at 23:57
@PerformanceDBA: Second "quick" question.. there is the possible need to handle multiple "sites" in one database. If this turns out to be the case, I was thinking I could add a "Site" table, with primary key "SiteName" - and then *every* table has "SiteName" become part of their primary key.. For example CONSTRAINT location_pkey (SiteName, LocationId) as opposed to just location_pkey (LocationId). Does that make sense? — Mark, Feb 05 '11 at 00:02
@PerformanceDBA: The SiteName would be short - probably CHAR(8) - so I figured it'd be alright to use as a key rather than a surrogate. Actually it'd probably be SiteCode thinking about it.. — Mark, Feb 05 '11 at 00:06
@Mark. You're most welcome. History-Unique: Overkill, the history row is a before-image. Site: is fine, but really it should be modelled, not added on as a blanket assumption. Your suggestion retains full Relational Power, but it not be nec, depends of whether you need to (eg) join to Site from the lower level tables. if not, SiteCode could be an FK but not PK in Location, only. Alias: are you complete with it ? — PerformanceDBA, Feb 05 '11 at 06:36

score 1 · Answer 2 · answered Dec 20 '10 at 16:23

I've run into this situation before as well. Depending on the amount of data your are trying to keep track of, it can be daunting. The historical table works nicely for ease of use at times because you can take a 'snapshot' of the record in the history table, then make the changes as needed in the production table. It's pretty straight forward to implement, however depending on how much data you have and how often it changes, you can end up with very large historical tables.

Another option is logging all changes that allow someone to 'replay' what happened and track it. Each change is logged into a table or a field (depending on your needs) that keeps track of who, when, and what was changed to what i.e. On Dec 31, 2010 Bob changed the status from 'Open' to 'Closed'.

Which system you want to use usually depends on how you'l need to keep/review/use the data later. Automated reports, review by a person, some combination of the two, etc.

The size of the table in terms of rows isn't so much the problem. With my current method of just using the one table the same problem can occur. — Mark, Dec 20 '10 at 16:38
That's where it can be a balancing act between how many columns you have in each table vs adding a log/audit entry each time a change is made. If the tables have a small number of columns, then duplicating the record isn't too bad. If you have 30 columns of large size, then you may conserve space by instead logging each change as a record by itself. — Thyamine, Dec 20 '10 at 17:02

score 0 · Answer 3 · answered Dec 21 '10 at 13:10

Depending on your budget and/or environment you might want to consider using Oracle's flashback archive feature.

You can turn on automatic "archiving" of rows in a table, and then run a statement on the basetable using something like

SELECT *
FROM important_data
AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '5' DAY)

Oracle takes care of maintaining the history in a separate (shadow) table. You can do this for any table so that you can also do a query with a join.

Historical / auditable database

3 Answers3

Revised 01 Jan 11

Data Model

Response to Comments

Linked