1

Based on the discussion I found here: Database: To delete or not to delete records, I want to focus on reference data in particular, add a few thoughts on that, and ask for your preferred approach in general, or based on which criteria you make the decision which of the approaches available you go for.

Let's assume the following data structure for a 'request database' for customers, whereas requests may be delivered via various channels (phone, mail, fax, ..; our 'reference data table I want to mainly focus on'):

Request (ID, Text, Channel_ID)
Channel(ID, Description)

Let's, for the beginning, assume the following data within those two tables:

Request:

ID    | Text                                         | Channel_ID
===============================================================  
1     | How much is product A currently?             | 1 
2     | What about my inquiry from 2011/02/13?       | 1
3     | Did you receive my payment from 2011/03/04?  | 2

Channel:

ID    | Description
===============================================================  
1     | Phone
2     | Mail
3     | Fax

So, how do you attack this assuming the following requirements:

  1. Channels may change over time. That means: Their descriptions may change. New ones may be added, only valid starting from some particular data. Channels may be invalidated (by some particular date)

  2. For reporting and monitoring purposes, it needs to be possibly to identify using which channel a request was originally filed.

  3. For new requests, only the currently 'valid' channels should be allowed, whereas for pre-existing ones, also the channels that were valid at that particular date should be allowed.

In my understanding, that clearly asks for a richer invalidation approach that goes beyond a deletion flag, probably something incorporating a 'ValidFrom / ValidTo' approach for the reference data table.

On the other hand, this involves several difficulties during data capture of requests, because for new requests, you only display they currently available channels, whereas for maintenance of pre-existing ones, all channels available as of the creation of this record need to be displayed. This might not only be complicated from a development point of view, but may also be non-intuitive to the users.

How do you commonly set up your data model for reference data that might chance over time? How do you create your user interface then? Which further parameters do you take into account for proper database design?

Community
  • 1
  • 1
bonifaz
  • 588
  • 3
  • 16

1 Answers1

1

In such cases I usually create another table, for example, channel_versions that duplicates all fields from channel and has extra create_date column(and it's own PK of course). For channel I define after insert/update triggers that copy new values into channel_versions. Now all requests from Request table refer to records from channel_versions. For new requests you need to get the most recent version of channel from channel_versions . For old requests you always know how channel looked when the request was fulfilled.

a1ex07
  • 36,826
  • 12
  • 90
  • 103
  • Thanks for your answer, sounds good! However, do I observe correctly that one drawback of that approach is that it requires 1 additional join if I want to determine the Description of the channel used for one particular requirement? – bonifaz Mar 20 '11 at 15:35
  • Not necessarily, in some cases you need to it, in others not. For example, if you want to get a channel description for an old request, you just join `channel_versions` instead of `channel` (still one join). If you want to get an old description for a record in `channel` table, then yes, you will have an extra join. If you want to get the most recent description for a channel, you don't have to look into `channel_versions` (the record will be the same as in `channel`) – a1ex07 Mar 20 '11 at 15:44
  • I understand, really interesting approach! Guess I'll give it a try next time. – bonifaz Mar 20 '11 at 21:00