1

In our database design we have a couple of tables that describe different objects but which are of the same basic type. As describing the actual tables and what each column is doing would take a long time I'm going to try to simplify it by using a similar structured example based on a job database.

So say we have following tables:

enter image description here

These tables have no connections between each other but share identical columns. So the first step was to unify the identical columns and introduce a unique personId:

enter image description here

Now we have the "header" columns in person that are then linked to the more specific job tables using a 1 to 1 relation using the personId PK as the FK. In our use case a person can only ever have one job so the personId is also unique across the Taxi driver, Programmer and Construction worker tables.

While this structure works we now have the use case where in our application we get the personId and want to get the data of the respective job table. This gets us to the problem that we can't immediately know what kind of job the person with this personId is doing.

A few options we came up with to solve this issue:

Deal with it in the backend

This means just leaving the architecture as it is and look for the right table in the backend code. This could mean looking through every table present and/or construct a semi-complicated join select in which we have to sift through all columns to find the ones which are filled.

All in all: Possible but means a lot of unecessary selects. We also would like to keep such database oriented logic in the actual database.

Using a Type Field

This means adding a field column in the Person table filled for example with numbers to determine the correct child table like:

enter image description here

So you could add a 0 in Type if it's a taxi driver, a 1 if it's a programmer and so on...

While this greatly reduced the amount of backend logic we then have to make sure that the numbers we use in the Type field are known in the backend and don't ever change.

Use separate IDs for each table

That means every job gets its own ID (has to be nullable) in Person like:

enter image description here

Now it's easy to find out which job each person has due to the others having an empty ID.

So my question is: Which one of these designs is the best practice? Am i missing an obvious solution here?

dbugger
  • 15,868
  • 9
  • 31
  • 33
MariusR
  • 1,534
  • 2
  • 11
  • 20
  • I think I would go with the Type Field option. However, why are you going away from the simple 3 table model you have started with? – BobC Jan 30 '17 at 17:20
  • 1
    Given what you already have, add a PersonTypeId, and make it an FK to a PersonTypes table. The first solution requires a lot of overhead to remember something you already knew, and the last gets very complicated very quickly and also introduces the possibility that a person is linked to two or more job types (which might be OK), but is confusing. – dbugger Jan 30 '17 at 17:21
  • Possible duplicate of [How do you deal with polymorphism in a database?](http://stackoverflow.com/questions/45621/how-do-you-deal-with-polymorphism-in-a-database) – Matthew McPeak Jan 30 '17 at 17:37
  • @BobC As the name and age columns are identical in all tables it would just mean at lot of maintenance when for example the name column needs to be slightly longer for all job types. Also we would like to have a unique ID to identify a person. Setting this up over multiple spread out tables would be quite the hazzle. Also it wouldn't really help us with the problem because we still would have problems try to figure out what job the person with the id "3" or with the name "Dave" had. – MariusR Jan 30 '17 at 19:47
  • @dbugger We already planned such a PersonTypes table for our second design option too but i just left it out to not clutter the diagramms. Thank you for the reasoning against the other options. – MariusR Jan 30 '17 at 19:49
  • @MatthewMcPeak I looked at that post but it seems like it's mainly focused on the step we took from the completely separate tables to unifying identical columns with the Person table – MariusR Jan 30 '17 at 19:51
  • @MariusR The post mentions the approach of just putting all the columns into one big table - a very viable option of the number of job-specific columns is limited. Consider, from Oracle's data dictionary, `DBA_INDEXES` (based on `SYS.IND$`). They present an `INDEX_TYPE` column, but the type-specific columns (e.g., `DOMIDX_MANAGEMENT`, `INCLUDE_COLUMN`, and `DURATION`) are all in-line in the same table (ok, view). I don't know if I'd engineer anything more complex than that without good reason (e.g., I plan on having dozens of jobs and more added all the time). – Matthew McPeak Jan 30 '17 at 20:17
  • @MatthewMcPeak Yes right there was that option too but we discarded this option during the design phase pretty quickly as in our actual case we have more close to 10 tables with each having roughly 3 - 8 columns each. That would make our table way to big to manage even though most of the columns will be null for each row. – MariusR Jan 30 '17 at 20:29
  • @MatthewMcPeak dba_indexes is a view, so I would hesitate to use it to justify any data modelling decision for a table – Jeffrey Kemp Jan 31 '17 at 06:01
  • 1
    I usually go for #2, i.e. have a Type field, but use an enumeration rather than an ID field, e.g. 'TAXI-DRIVER', 'PROGRAMMER', 'CONSTRUCTION-WORKER' with a check constraint. – Jeffrey Kemp Jan 31 '17 at 06:03
  • @JeffreyKemp Of course it is a view, but `SYS.IND$` is modeled the same way. I just explained it in terms of `DBA_INDEXES` because it's easier to understand. – Matthew McPeak Jan 31 '17 at 13:27

3 Answers3

1

Bill Karwin made a good explanation on a problem similar to this one. https://stackoverflow.com/a/695860/7451039

Community
  • 1
  • 1
  • 1
    Instead of giving (almost) link-only answer, you should give some explanations in your answer, at least copy some relevant information from the link into your answer, or explain the solution in the linked answer using your own words. – justhalf Jan 31 '17 at 06:54
0

We've now decided to go with the second option because it seem to come with the least drawbacks as described by the other commenters and posters. As there was no actual answer portraying the second option as a solution i will try to summarize our reasoning:

Against Option 1:

There is no way to distinguish the type from looking at the parent table. As a result the backend would have to include all logic which includes scanning all tables for the that contains the id. While you can compress most of the logic into a single big Join select it would still be a lot more logic as opposed to the other options.

Against Option 3:

As @yuri-g said this one is technically not possible as the separate IDs could not setup as primary keys. They would have to be nullable and as a result can't be indexed, essentially rendering the parent table useless as one of the reasons for it was to have a unique personID across the tables.

Against a single table containing all columns:

For smaller use cases as the one i described in the question this might me viable but we are talking about a bunch of tables with each having roughly 2-6 columns. This would make this option turn into a column-mess really quickly.

Against a flat design with a key-value table:

Our properties have completly different data types, different constraints and foreign key relations. All of this would not be possible/difficult in this design.

Against custom database objects containt the child specific properties:

While this option that @Matthew McPeak suggested might be a viable option for a lot of people our database design never really used objects so introducing them to the mix would likely cause confusion more than it would help us.

In favor of the second option:

This option is easy to use in our table oriented database structure, makes it easy to distinguish the proper child table and does not need a lot of reworking to introduce. Especially since we already have something similar to a Type table that we can easily use for this purpose.

MariusR
  • 1,534
  • 2
  • 11
  • 20
-1

Third option, as you describe it, is impossible: no RDBMS (at least, of I personally know about) would allow you to use NULLs in PK (even composite).

Second is realistic.

And yes, first would take up to N queries to poll relatives in order to determine the actual type (where N is the number of types). Although you won't escape with one query in second case either: there would always be two of them, because you cant JOIN unless you know what exactly you should be joining.

So basically there are flaws in your design, and you should consider other options there. Like, denormalization: line non-shared attributes into the parent table anyway, then fields become nulls for non-correpondent types. Or flexible, flat list of attribute-value pairs related through primary key (yes, schema enforcement is a trade-off).

Or switch to column-oriented DB: that's a case for it.

Yuri G
  • 1,206
  • 1
  • 9
  • 13
  • We've now decided to go with the second option because it seems, as you said, like it's the most realistic one. The method you described using the single table with many many columns would be quite tedious as our real case has more close to columns 10-20 overall. The other flat design you mentioned would also be counterproductive as we will be using constraints and foreign key relationships on the table values. – MariusR Feb 02 '17 at 15:14
  • I did not downvote the comment. I don't know who did it. – MariusR Feb 03 '17 at 07:38