2

I have a column with a uniqueidentifier that can potentially reference one of four different tables. I have seen this done in two ways, but both seem like bad practice.

First, I've seen a single ObjectID column without explicitly declaring it as a foreign key to a specific table. Then you can just shove any uniqueidentifier you want in it. This means you could potentially insert IDs from tables that are not part of the 4 tables I wanted.

Second, because the data can come from four different tables, I've also seen people make 4 different foreign keys. And in doing so, the system relies on ONE AND ONLY ONE column having a non-NULL value.

What's a better approach to doing this? For example, records in my table could potentially reference Hospitals(ID), Clinics(ID), Schools(ID), or Universities(ID)... but ONLY those tables.

Thanks!

Jake
  • 31
  • 1
  • 3
  • 2
    Have one key which references Buildings(ID) instead. – lc. Nov 09 '12 at 16:27
  • Well, what's in the Buildings table then? An ObjectID or four seperate columns? Am I back to the same issue? The other four tables already exist. – Jake Nov 09 '12 at 16:43
  • Either just an ID; an ID and a TypeID; or best is an ID, a TypeID and all columns common to Hospitals, Clinics, Schools, and Universities. Then each of those tables' IDs is actually a BuildingID, unique across all four tables. – lc. Nov 09 '12 at 16:49
  • At least, if you have four separate FK, you can setup and enforce proper referential integrity using foreign key constraints - can't achieve this if you have one column referencing one of four different tables .... – marc_s Nov 09 '12 at 17:06
  • Possibly related and helpful: http://stackoverflow.com/a/5001664/44853 – lc. Nov 09 '12 at 17:35

3 Answers3

8

You might want to consider a Type/SubType data model. This is very much like class/subclasses in object oriented programming, but much more awkward to implement, and no RDBMS (that I am aware of) natively supports them. The general idea is:

  • You define a Type (Building), create a table for it, give it a primary key
  • You define two or more sub-types (here, Hospital, Clinic, School, University), create tables for each of them, make primary keys… but the primary keys are also foreign keys that reference the Building table
  • Your table with one “ObjectType” column can now be built with a foreign key onto the Building table. You’d have to join a few tables to determine what kind of building it is, but you’d have to do that anyway. That, or store redundant data.

You have noticed the problem with this model, right? What’s to keep a Building from having entries in in two or more of the subtype tables? Glad you asked:

  1. Add a column, perhaps “BuildingType”, to Building, say char(1) with allowed values of {H, C, S, U} indicating (duh) type of building.
  2. Build a unique constraint on BuildingID + BuildingType
  3. Have the BulidingType column in the subtables. Put a check constraint on it so that it can only ever be set to the value (H for the Hospitals table, etc.) In theory, this could be a computed column; in practice, this won't work because of the following step:
  4. Build the foreign key to relate the tables using both columns

Voila: Given a BUILDING row set with type H, an entry in the SCHOOL table (with type S) cannot be set to reference that Building

You will recall that I did say it was hard to implement.

In fact, the big question is: Is this worth doing? If it makes sense to implement the four (or more, as time passes) building types as type/subtype (further normalization advantages: one place for address and other attributes common to every building, with building-specific attributes stored in the subtables), it may well be worth the extra effort to build and maintain. If not, then you’re back to square one: a logical model that is hard to implement in the average modern-day RDBMS.

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
  • +1 Good answer, except in addition to enforcing the _exclusivity_ of the child (as you explained), it is also possible to declaratively enforce the _presence_ of the child, but that would require [a lot of gymnastics with circular and deferred FKs](http://stackoverflow.com/a/12261722/533120) and is probably not worth the effort. – Branko Dimitrijevic Nov 09 '12 at 22:43
  • Using only conventional declarative referential integrity tools, I don't think it's possible to enforce the existance of the child entry. You could INSERT a BUILDING, and never make a child entry. In extremes, you could put a check constraint (or lookup table + foreign key) on BUILDING.BuildingType, and add a trigger that always creates a default row in the appropriate subtable, and as you say it seems like overkill. Difficult, awkward, and probably why the major vendors haven't gotten around to implementing it. – Philip Kelley Nov 12 '12 at 15:02
  • Assuming you include *deferred* constraints into the definition of "conventional" integrity constraints, it is possible to enforce the existence of child purely through declarative means (take a look at my link). I won't argue this _should_ be done in practice, but it is possible. – Branko Dimitrijevic Nov 12 '12 at 16:30
  • True. Another option (in SQL Server, at least) are INSTEAD OF triggers. Define one for each subtable, and perform your inserts on the subtables. Doesn't enforce presence of the subtype, but makes it easier to enter them. – Philip Kelley Nov 13 '12 at 14:54
5

Let's start at the conceptual level. If we think of Hospitals, Clinics, Schools, and Universities as classes of subject matter entities, is there a superclass that generalizes all of them? There probably is. I'm not going to try to tell you what it is, because I don't understand your subject matter as well as you do. But I'm going to proceed as if we can call all of them "Institutions", and treat each of the four as subclasses of Institutions.

As other responders have noted, class/subclass extension and inheritance are not built into most relational database systems. But there is plenty of assistance, if you know the right buzzwords. What follows is intended to teach you the buzzwords, in database lingo. Here is a summary of the buzzwords coming: "ER Generalization", "ER Specialization", "Single Table Inheritance", "Class Table Inheritance", "Shared Primary Key".

Staying at the conceptual level, ER modeling is a good way of understanding the data at a conceptual level. In ER modeling, there is a concept, "ER Generalization", and a counterpart concept "ER Specialization" that parallel the thought process I just presented above as "superclass/subclass". ER Specialization tells you how to diagram subclasses, but it doesn't tell you how to implement them.

Next we move down from the conceptual level to the logical level. We express the data in terms of relations or, if you will, SQL tables. There are a couple of techniques for implementing subclasses. One is called "Single Table Inheritance". The other is called "Class Table Inheritance". In connection with Class table inheritance, there is another technique that goes by the name "Shared primary Key".

Going forward in your case with class table inheritance, we first design a table called "Institutions", with an Id field, a name field, and all of the fields that pertain to institutions, no matter which of the four kinds they are. Things like mailing address fields, for instance. Again, you understand your data better than I do, and you can find fields that are in all four of your existing tables. We populate the id field in the usual way.

Next we design four tables called "Hospitals", "Clinics", "Schools", and "Universities". These will contain an id field, plus all of the data fields that pertain only to that kind of institution. For instance, a hospital might have a "bed capacity". Again, you understand your data better than I do, and you can figure these out from the fields in your existing tables that didn't make it into the Institutions table.

This is where "shared primary key" comes in. When a new entry is made into "Institutions", we have to make a new parallel entry into one of four specialized subclass tables. But we don't use some sort of autonumber feature to populate the id field. Instead, we put a copy of the id field from the "Institutions" table into the id field of the subclass table.

This is a little work, but the benefits are well worth the effort. Shared primary key enforces the one-to-one nature of the relationship between subclass entries and superclass entries. It makes joining superclass data and subclass data simple, easy, and fast. It eliminates the need for a special field to tell you which subclass a given institution belongs in.

And, in your case, it provides a handy answer to your original question. The foreign key you were originally asking about is now always a foreign key to the Institutions table. And, because of the magic of shared-primary-key, the foreign key also references the entry in the appropriate subclass table, with no extra work.

You can create four views that combine institution data with each of the four subclass tables, for convenience.

Look up "ER Specialization", "Class Table Inheritance", "Shared Primary Key", and maybe "Single Table Inheritance" on the web, and here in SO. There are tags for most of these concepts or techniques here in SO.

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58
0

You could put a trigger on the table and enforce the referential integrity there. I don't think there's a really good out-of-the-box feature to implement this requirement.

d89761
  • 1,434
  • 9
  • 11