7

When using a database normalized accoring to 6NF principles, how would you store historical attribute data?

Let say for example we take this example from @PerformanceDBA but with the following extra requirement:

We need to store historical data for all our products, we should be able to just enter a date and get a snapshot of the attributes of the product at that particular time.

A more practical example:
Suppose the disks and CPU's from the example above are virtual and a user can change the disk capacity at will. How can we alter the database so that we can retrieve the attributes of a given disk at any time in the past (of course after it's creation date) while keeping the 5NF view fast enough.

Things I'm considering

  • Add a timestamp column 'changedate' to each attribute table (this would result in a pretty complex query with a subquery and join for each attribute table)
  • Create a separate *history table for each attribute table (could result in a massive amount of table since we have around 70 attributes spread over 20 product types)
  • Additionally: add an indexed 'current' column to each attribute table to speed up the 5NF view

Any help is appreciated!


Edit: I know the concept of temporal databases, yet the problem is that for the database engine i'm working with (postgresql) the temporal extension isn't fully implemented yet. Any advice on how to achieve this without temporal databases?

ChrisR
  • 14,370
  • 16
  • 70
  • 107
  • 1
    Just to warn you-I went down the road of NOT having history tables, and using a "from" and "to" date on each row of my "entities". It was the biggest mistake I made and it turned the project into a nightmare. It took the guidance of the person you mention, PerformanceDBA, to make me truley understand what a database really is (i.e. not just a bucket for objects). I've since rewritten the project using a more traditional approach (history tables/views), and it is better in every way. Okay, not much of an argument but to go into details would take a huge amount of documentation. – Mark Mar 07 '12 at 16:04
  • 1
    This is the post that got me started on changing how I look at databases in general (from the software engineers point of view, to the DBA's point of view) :- http://stackoverflow.com/questions/4491173/historical-auditable-database - I'm not saying it is wrong to do what has been suggested (using "from" and "to" and no history tables), but for me it created a big mess, and I'll never go down that road again. – Mark Mar 07 '12 at 16:04

1 Answers1

9

The recently approved SQL:2011 standard incorporates features that allow you to deal better with this kind of problem than you could ever before.

Not that you'll be able to do everything you'd want to do in the temporal arena, but what did get introduced is indeed a fairly significant improvement.

A good presentation about it is at http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf .

Note that there's only a single vendor with reasonable support for these features in his SQL product, one other is perhaps working on it, and a third has opened the voting channel for their customers.

There's also a "Temporal Data" discussion group at www.linkedin.com dedicated to precisely your subject at hand.

EDIT trying to address "Any advice on how to achieve this without temporal databases?"

Do not add just a single date/time type column to your models. The first reason is as you gave, the second reason is that this solution is also the one promoted by the new standard, and that it will facilitate transition to engines that do support the new features once they are available.

So add BOTH a start- and an end- date/time column. DO NOT MAKE EITHER OF THEM NULLABLE. The new standard requires this for its temporal features. If the end-MIT (moment-in-time) is still unknown, use the highest value of the applicable time type, e.g. 9999-12-31.

You do not NEED to "create separate history tables for each attribute". It is equally possible to have a "single entity table" that keeps "the history of an entire entity occurrence". The downside is that it will be difficult to query for when an ACTUAL change occurred to some particular attribute (because you get new historical rows for any change to any attribute, possibly copying over the same attribute value for most of the attributes). The 'single table' is likely to be an eager consumer of space, the 'separate history for each attribute' may be an eager consumer of querying CPU time. It will be a balancing act, and where the balance is precisely, depends on your particular situation.

Do not "add an indexed 'current' column" to your tables. First, they will not help you transitioning to the new features when your engine has them, and second, Y/N columns are very bad discriminators, and therefore very poor candidates for indexing. I'd rather add your start- or end-mit to the index, they can be expected to give you the same wins for the 'current' rows, and a better win for the non-current rows, whenever you need to query those.

As for the enforcement of database constraints such as non-overlap in time periods in temporal keys and inclusion of time periods in temporal RI, well you're just entirely on your own. Write the code you need in triggers or SPROCs or application code, in decreasing order of preference.

Was this more helpful ?

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
  • Thanks, i've now found out about the temporal extension for postgresql (https://github.com/jeff-davis/PostgreSQL-Temporal/downloads) which seems to be what i'm looking for. Some more practical examples would help tho. – ChrisR Jan 24 '12 at 09:39