How to emulate tagged union in a database?

Question

What is the best way to emulate Tagged union in databases? I'm talking about something like this:

create table t1 {
  vehicle_id INTEGER NOT NULL REFERENCES car(id) OR motor(id) -- not valid
  ...
}

where vehicle_id would be id in car table OR motor table, and it would know which.

(assume that motor and car tables have nothing in common0

score 11 · Accepted Answer · answered Nov 13 '09 at 17:34

Some people use a design called Polymorphic Associations to do this, allowing vehicle_id to contain a value that exists either in car or motor tables. Then add a vehicle_type that names the table which the given row in t1 references.

The trouble is that you can't declare a real SQL foreign key constraint if you do this. There's no support in SQL for a foreign key that has multiple reference targets. There are other problems, too, but the lack of referential integrity is already a deal-breaker.

A better design is to borrow a concept from OO design of a common supertype of both car and motor:

CREATE TABLE Identifiable (
 id SERIAL PRIMARY KEY
);

Then make t1 reference this super-type table:

CREATE TABLE t1 (
  vehicle_id INTEGER NOT NULL,
  FOREIGN KEY (vehicle_id) REFERENCES identifiable(id)
  ...
);

And also make the sub-types reference their parent supertype. Note that the primary key of the sub-types is not auto-incrementing. The parent supertype takes care of allocating a new id value, and the children only reference that value.

CREATE TABLE car (
  id INTEGER NOT NULL,
  FOREIGN KEY (id) REFERENCES identifiable(id)
  ...
);

CREATE TABLE motor (
  id INTEGER NOT NULL,
  FOREIGN KEY (id) REFERENCES identifiable(id)
  ...
);

Now you can have true referential integrity, but also support multiple subtype tables with their own attributes.

The answer by @Quassnoi also shows a method to enforce disjoint subtypes. That is, you want to prevent both car and motor from referencing the same row in their parent supertype table. When I do this, I use a single-column primary key for Identifiable.id but also declare a UNIQUE key over Identifiable.(id, type). The foreign keys in car and motor can reference the two-column unique key instead of the primary key.

The surrogate key for `identifiable` is only good when there are attributes in `identifiable` that the query needs to select. If `identifiable` only serves to enforce the constraints, using a composite key will allow to get rid of it in the queries competely. — Quassnoi, Nov 13 '09 at 17:54
I've come up with & used the 'common supertype' approach myself, and used it succesfully in major system migration/redevelopment projects. (NZ Government, SPOT25 for MoE) — Thomas W, Jun 26 '12 at 22:52

score 6 · Answer 2 · answered Nov 13 '09 at 17:33

6

CREATE TABLE vehicle (type INT NOT NULL, id INT NOT NULL,
             PRIMARY KEY (type, id)
)

CREATE TABLE car (type INT NOT NULL DEFAULT 1, id INT NOT NULL PRIMARY KEY,
             CHECK(type = 1),
             FOREIGN KEY (type, id) REFERENCES vehicle
)

CREATE TABLE motorcycle (type INT NOT NULL DEFAULT 2, id INT NOT NULL PRIMARY KEY,
             CHECK(type = 2),
             FOREIGN KEY (type, id) REFERENCES vehicle
)

CREATE TABLE t1 (
  ...
  vehicle_type INT NOT NULL,
  vehicle_id INT NOT NULL,
  FOREIGN KEY (vehicle_type, vehicle_id) REFERENCES vehicle
  ...
)

answered Nov 13 '09 at 17:33

Quassnoi

413,100
91
616
614

It'd make life easier if you defined `VEHICLE.VEHICLE_ID` as the primary key so you don't have to reference a composite key, and have type & id columns using a unique constraint. – OMG Ponies Nov 13 '09 at 17:36
`@OMG Ponies:` With this layout, you don't need to reference `vehicle` at all. You can just join with `cars` or `motorcycles`, depending on the `type`. `vehicle` here serves only to police the relationships. – Quassnoi Nov 13 '09 at 17:50
2

With this method, is there some way to guarantee that there won't be "orphan" vehicles that don't have a corresponding row in `car` or `motorcycle`? – Michael Hewson Sep 25 '17 at 21:31

Wong Jia Hau · Answer 3 · 2021-07-02T03:07:29.447

I think the least-boilerplate solution is to use constraint and check.

For example, consider this ADT in Haskell:

data Shape = Circle {radius::Float} | Rectangle {width::Float, height::Float}

The equivalent in MySQL/MariaDB would be (tested on 10.5.11-MariaDB):

CREATE TABLE shape (        
  type ENUM('circle', 'rectangle') NOT NULL,
  radius FLOAT,             
  width FLOAT,              
  height FLOAT,             
  CONSTRAINT constraint_circle CHECK 
    (type <> 'circle' OR radius IS NOT NULL),
  CONSTRAINT constraint_rectangle CHECK 
    (type <> 'rectangle' OR (width IS NOT NULL AND height IS NOT NULL))
);                                      
                                        
INSERT INTO shape(type, radius, width, height)
  VALUES ('circle', 1, NULL, NULL); -- ok
                                                                                                                    
INSERT INTO shape(type, radius, width, height)
  VALUES ('circle', NULL, 1, NULL); -- error, constraint_circle violated

Note that the above uses type <> x OR y instead of type = x AND y. This is because the latter essentially means that all rows must have type of x, which defeats the purpose of tagged union.

Also, note that the solution above only check for required columns, but does not check for extraneous columns.

For example, you can insert a rectangle which has defined radius.

This can be easily mitigated by adding another condition for constraint_rectangle, namely radius is null.

However, I will not recommend doing so as it makes adding new type tedious.

For example, to add a new type triangle with one new column base, not only we will need to add a new constraint, but we also need to modify the existing constraints to ensure that their base is null.

FWIW: Saying `type <> x OR y` is actually equivalent to implication: `type = x => y`. Another equivalent (arguably easier to digest) expression would be `NOT (type = x AND NOT y)`. I.e. it is **not** allowed to **be** a circle **and not** to have a radius. Your version is this transformed to a more compact form using De Morgan's laws. — Robin Koch, Jul 03 '21 at 00:58

score 3 · Answer 4 · answered Nov 13 '09 at 17:37

I think you could model such a reference by using table inheritance in PostgreSQL.

If you really need to know where a row comes from in a Query, you could use a simple UNION ALL statment like (this possibility has nothing to do with table inheritance):

SELECT car.*, 'car' table_name
UNION ALL
SELECT motor.*, 'motor' table_name

How to emulate tagged union in a database?

4 Answers4

Linked