1

In a food delivery app I have a USERS table that holds info about user data like firstname, lastname, email, password etc.

A small subset of users (~1% of all users) will have a delivery person role assigned. This means that there will be some delivery person-specific data like driver license id, average_rating and some more.

What I'm unsure of is what's better: have one USERS table that holds all the data (which means that for the majority of users the delivery-person specific columns will be null) or have a subtype table (DELIVERY_PERSON) that will hold those columns and a foreign key to USERS table?

Option #1

USERS:

  • id(PK)
  • email
  • password
  • name ...
  • driver_license_id (null for all regular users)
  • avg_rating (null for all regular users)
  • more delivery person specific columns

Option #2

USERS:

  • id(PK)
  • email
  • password
  • name

DELIVERY_PERSON:

  • id (PK, FK to USERS.id)
  • driver_license_id
  • avg_rating
  • more delivery person specific columns

I've seen several similar questions on SO, but in all of them there are multiple subtypes like Vehicle -> Car/Airplane/Boat etc.

In my scenario there is one base type (user) and only one possible extending subtype (delivery person). I'm wondering if having only one possible subtype somehow affects what option to choose.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Domin0
  • 197
  • 2
  • 11
  • 2
    This will probably be considered primarily opinion based, but in your case I'd probably use a separate table, as you will likely want to limit certain references to user only to delivery drivers, which the additional table will facilitate. e.g. on an order you want to store the Id of the delivery driver, which you would want as an FK to `DELIVERY_PERSON` rather than `USERS` since only a delivery person can deliver an order – GarethD Nov 12 '21 at 10:43
  • 1
    Another benefit of the sub type is that it makes enforcing required data a bit more straight forward. i.e. if a delivery driver must have a driving licence number, then you can make this column non-nullable in the sub table, if you only use one table you have to use a check constraint, which isn't the end of the world, but I think a non-null column is more transparent – GarethD Nov 12 '21 at 10:47
  • 1
    As an aside, there's no real issue with having columns that contain 99% null values though, just define them as [Sparse Columns](https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-columns?view=sql-server-ver15), so this shouldn't be a massive factor in the decision, so if every other reason you come up with points towards a single table solution (as I say, unlikely in this case but possible in other cases) then having 99% null values shouldn't necessarily push you to a sub table approach – GarethD Nov 12 '21 at 10:54
  • Does this answer your question? [What is the best way to implement Polymorphic Association in SQL Server?](https://stackoverflow.com/questions/7000283/what-is-the-best-way-to-implement-polymorphic-association-in-sql-server) – Charlieface Nov 12 '21 at 12:42

1 Answers1

3

The "clean" implementation of subtypes is, in my perception, to create a separate table for each subtype and a common one for the supertype. This avoids complex integrity conditions and reduces the number of null values.

To illustrate how complex integrity conditions arise, just imagine you have a supertype and a subtype with 10 additional mandatory properties ("columns") and several optional properties. Now if a single optional property is non-null, the 10 additional mandatory properties must be non-null as well. This gets worse if you imagine you have 12 subtypes instead of just one.

On the other hand, if you store everything in a single table, you don't have to perform joins. This is a performance advantage that can add up if you often need the additional columns. Naturally this is only partially true. If you have many subtypes, the rows will be long. This reduces the effectiveness of your data cache.

If your application doesn't need the additional information very often, it is probably better to keep a separate table for the additional columns. If it needs all the information all the time, you'll probably be better of with a single table containing everything.

In short: there is no general answer to your question. The best approach will be to make a guess based on your application and my considerations. You then implement this and test if the performance of your implementation meets your requirements. If so, you have a valid implementation. If not, try the other strategy.

Ronald
  • 2,842
  • 16
  • 16
  • The paragraph about possible issues with mandatory/optional columns appeals to me the most. I'm also less worried about performance issues since I don't expect the database to grow too much. – Domin0 Nov 12 '21 at 11:31