A topic to experienced database architects

Question

I face the following problem.

I'm creating a database for (say) human beings' info. All the human beings may be classified in one of the three categories: adult female, adult male, child. It is clear that the parameters like "height" and "weight" are applicable to all of the categories. The parameter "number of children" is applicable only to adults, while the parameter "number of pregnancies" is applicable to females only. Also, each parameter may be classified as mandatory or optional depending on the category (for example, for adults the parameter "number of ex-partners" is optional).

When I load (say) "height" and "weight", I check whether the info in these two fields is self-consistent. I.e., I mark as a mistake the record which has height=6'4'' and weight=10 lb (obviously, this is physically impossible). I have several similar verification rules.

When I insert a record about a human being, I need to reflect the following characteristics of the info:

the maximum possible info for the category of this particular human being (including all the optional parameters).
the required minimum of information for the category (i.e., mandatory fields only)
what has actually been inserted for this particular human being (i.e., it is possible to insert whatever I have for this person no matter whether it is smaller than the amount of required minimum of info or not). The non-trivial issue here is that a field "XXX" may have NULL value because I have never inserted anything there OR because I have intentionally inserted exactly NULL value. The same logic with the fields that have a default value. So somewhere should be reflected that I have processed this particular field.
what amount of inserted information has been verified (i.e., even if I load some 5 fields, I can check for self-consistency only 3 fields while ignoring the 2 left).

So my question is how to technically organize it. Currently, all these required features are either hardcoded with no unified logic or broken into completely independent blocks. I need to create a unified approach.

I have some naive ideas in my head in this regard. For example, for each category of human beings, I can create and store a list of possible fields (I call it "template"). A can mark those fields that are mandatory.

When I insert a record about a human being, I copy the template and mark what fields from this templates have actually been processed. At the next stage, I can mark in this copy of the template those fields that will be currently verified.

The module of verification is specially corrected in the following way: for each verification procedure I create a list of fields that are being used in this particular verification procedure. Then I call only those verification procedures that have those fields that are actually marked "to be verified" in the copy of the template for the particular human being that is to be verified (see the previous passage).

As you see, this is the most straightforward way to solve this problem. But my guess is that there are a lot of quite standardized approaches that I'm not aware of. I really doubt that I'm the first in the world to solve such a problem. I don't like my solution because it is really painfull to write the code to correctly reflect in this copied template all the "updates" happening with a record.

So, I ask you to share your opinion how would you solve this problem.

@Nathan Hughes - I hope all the 12-years old will be more responsible at the moment when I actually finish this project:) The database I am working on is not about humans - it is about financial instruments. And indeed there are a lot of such "transgendered" stocks and bonds:) But I prefer to explain the nature in such a simplified way with human beings. — iCoffee, Mar 26 '13 at 18:15
This is not really a good fit for SO. SO is for questions that can have a definitive "answer", not open-ended design discussions. — Barmar, Mar 26 '13 at 18:23
yes, i guessed it must be an oddly-chosen toy example. it does seem data validation is likely problematic for both cases. — Nathan Hughes, Mar 26 '13 at 18:46

score 2 · Accepted Answer · edited May 23 '17 at 12:11

I think there are two questions here:

how do I store polymorphic data in a database?
how do I validate complex business rules?

You should address them separately - trying to solve both at once is probably too hard.

There are a few approaches to polymorphic data in RDBMSes - ORMs use the term inheritance mapping, for instance. The three solutions here - table per class hierarchy, table per subclass and table per concrete class - are "pure" relational solutions. You can also use the "Entity-Attribute-Value" design, or use a document approach (storing data in structured formats such as XML or JSON) - these are not "pure" relational options, but have their place.

Validating complex business rules is often done using rule engines - these are super cool bits of technology, but you have to be sure that your problem really fits with their solution - deciding to invest in a rules engine means your project changes into a rules engine project, not a "humans" project. Alternatively, most mainstream solutions to this embody the business logic about the entities in the application's business logic layer. It sounds like you're outgrowing this.

your review of approaches to polymorphic data and business ruless is perfect! Thanks a lot!) — iCoffee, Mar 26 '13 at 20:39

score 1 · Answer 2 · answered Mar 26 '13 at 20:16

This exact problem, both in health terms and in terms of a financial instrument, is used as a primary example in Martin Fowlers book Analysis Patterns. It is an extensive topic. As @NevilleK says you are trying to deal with two questions, and it is best to deal with them separately. One ultra simplified way of approaching these problems is:

1 Storage of polymorphic data - only put mandatory data that is common to the category in the category table. For optional data put these in a separate table in 1-1 relationship to the category table. Entries are made in these optional tables only if there is a value to be recorded. The record of the verification of the data can also be put in these additional tables.

2 Validate complex business rules - it is useful to consider the types of error that can arise. There are a number of ways of classifying the errors but the one I have found most useful is (a) type errors where one can tell that the value is in error just by looking at the data - eg 1980-02-30. (b) context errors where one can detect the error only by reference to previously captured date - eg DoB 1995-03-15, date of marriage 1996-08-26. and (c) lies to the system - where the data type is ok; the context is ok; but the information can only be detected as incorrect at a later date when more information comes to light eg if I register my DoB as 1990-12-31, when it is something different. This latter class of error typically has to be dealt with by procedures outside the system being developed.

I see that you've really faced the same problems:) And I totally agree with you regarding your passage about verification. For me the problem is that in my project all these elements and features were either hardcoded or chaotically structured. I beleive, now is the time to rearrange it properly) — iCoffee, Mar 26 '13 at 20:36

score 0 · Answer 3 · answered Mar 26 '13 at 19:24

I would use the Party Role pattern (Silverston):

Party
id
name

Individual : Party
current_weight
current_height

PartyRole
id
party_id
from_date
to_date (nullable)

AdultRole : PartyRole
number_of_children

FemaleAdultRole : AdultRole
number_of_pregnancies

Postgres has a temporal extension such that you could enforce that a party could only play one role at a time (yet maintain their role histories).

Use table inheritance. For simplicity use Single Table Inheritance (has nulls), for no nulls use Class Table Inheritance.

A topic to experienced database architects

3 Answers3