1

I'm hoping to get a consensus about what db schema is best for my situation to store 'type' information for Widgets in a table. A Widget can have only one type but that type can be either a Preset-Type OR a Custom-Type. I obviously create the preset types and a user would create a custom type.

I'll be using MySQL and INNODB on a server. I'll also be using SQLite to store the same info on an App. But we'll just talk about the server here. I'm an app programmer, not a DB admin, but want to get the DBs right for this project the first time and normalized within reason.

In my searching on whether or not I should use nulls for foreign keys I've come across the following answers from people who have much more DB experience than I do.

  • "Of course Nulls are ok in foreign keys and elsewhere."
  • "NULLs in foreign keys are perfectly acceptable."
  • "One area where NULLs are indispensable is in foreign keys."
  • "Nulls should almost never be used, especially in foreign keys."
  • "Nulls should never be used anywhere."
  • "Having a column with lots of NULL values often indicates a need for (further) normalization."

I need to know if it's bad practice to use Nulls in the particular case of Model #2 and which model is preferable and why. Or possibly suggest a better model. Thanks for any input.

Model #1

Have one 'types' table for both Preset and Custom Types. I do this by pre-populating the 'types' table with preset types and leaving around 1500 reserved spaces for future Preset-types that I could later add.

Pros: Easy, no extra tables, no joins, probably the fastest option, and prob less db space in the long run (4 byte type_id). And widgets table type_id FK will never be NULL.

Cons: Probably not good normalization practice to mix preset and custom types together since presets don't need some fields like 'account_id', etc. If i ever wanted more than 1500 presets (highly unlikely) I'd need to figure something else out. This model also uses sentinel/placeholder values in the types table for presets and reserved preset spots.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id   INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    type_id     INT UNSIGNED NOT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE,
    FOREIGN KEY (type_id) REFERENCES types(type_id)
    );  
CREATE TABLE types (
    type_id     INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    name        VARCHAR(100) NOT NULL,
    PRIMARY KEY (type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
    );

Model #2

Separate widget-type table for preset and custom types. 'widgets' table has nullable FK field for preset type and custom type. A Check constraint makes sure one of them is null and the other is not.

Pros: only 1 extra table in the DB. No sentinal/placeholder values except maybe a nulled FK. Don't need to reserve preset value spaces and no limit to future preset type additons.

Cons: Uses one FK null per record in widgets table for either preset_type_id or custom_type_id.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id       INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL, 
    preset_type_id  INT UNSIGNED DEFAULT NULL,
    custom_type_id  INT UNSIGNED DEFAULT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE,
    FOREIGN KEY (preset_type_id) REFERENCES preset_types(preset_type_id),
    FOREIGN KEY (custom_type_id) REFERENCES custom_types(custom_type_id),
    CHECK ((preset_type_id IS NOT NULL AND custom_type_id IS NULL) OR (preset_type_id IS NULL AND custom_type_id IS NOT NULL) )
    );  
CREATE TABLE preset_types (
    preset_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (preset_type_id)
    );  
CREATE TABLE custom_types (
    custom_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (custom_type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
    );

Model #3

Uses intermediary tables widget_preset_types and widget_custom_types. If a widget has a preset type it will be referenced in the widget_preset_types table, alternatively if the widget has a custom type it will be referenced in the widget_custom_types table.

Pros: Probably the most normalized model. Never uses Nulls or FK Nulls. No sentinal/placehodler values used.

Cons: Adds 3 extra tables in the DB just to determine widget-type. I have other things besides widgets in my DB with custom/preset types which means I could be adding at least 12 extra tables to my DB with this model. Is it over-normalized? I'll have to use some type of join to get all widget info and type info from 3 tables at the same time. I'll have to check whether or not a custom_type_id or preset_type_id comes back in the join probably using more code than I would have used checking for nulls in Model#2. Probably slower than Model 1 & 2. More tables means more indexes means more ram.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id       INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE
    );
CREATE TABLE preset_types (
    preset_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (preset_type_id)
    );  
CREATE TABLE custom_types (
    custom_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (custom_type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE
    );
CREATE TABLE widget_preset_types (
    widget_id       INT UNSIGNED NOT NULL UNIQUE, 
    preset_type_id  INT UNSIGNED NOT NULL, 
    PRIMARY KEY (widget_id),
    FOREIGN KEY (widget_id) REFERENCES widgets(widget_id) ON DELETE CASCADE,
    FOREIGN KEY (preset_type_id) REFERENCES preset_types(preset_type_id)
    );  
CREATE TABLE widget_custom_types (
    widget_id       INT UNSIGNED NOT NULL UNIQUE, 
    custom_type_id  INT UNSIGNED NOT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (widget_id) REFERENCES widgets(widget_id) ON DELETE CASCADE,
    FOREIGN KEY (custom_type_id) REFERENCES custom_types(custom_type_id)
    );
philipxy
  • 14,867
  • 6
  • 39
  • 83
user223304
  • 123
  • 7
  • What is your 1 question? Seems like it's likely going to be a faq re DB/SQL subtypes/inheritance. (Thanks for the thoughts/designs though.) Before considering posting please always google your error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags, & read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. We cannot reason, communicate or search unless we make the effort to (re-re-re-)write clearly. PS Read edit help re code. – philipxy Sep 12 '19 at 11:29
  • 2
    There's no such thing as "better"/"best" in engineering unless *you* define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". [Strategy for “Which is better” questions](https://meta.stackexchange.com/q/204461) – philipxy Sep 12 '19 at 11:29
  • Possible duplicate of [How can you represent inheritance in a database?](https://stackoverflow.com/questions/3579079/how-can-you-represent-inheritance-in-a-database) – philipxy Sep 12 '19 at 11:31
  • What do you think this has to do with normalization? The term doesn't mean "designing well" or "rearranging tables". We can have a table per supertype/subtype/default or left/right join those tables into one with nulls. In either case offering the other design as a view. For (such) idiomatic use it's hard to argue that nulls are confusing. Normalization to higher NFs replaces a table by projections of it that natural join back. – philipxy Sep 12 '19 at 11:36
  • 1
    If i have to boil it down to one question it would be "I need to know if it's bad practice to use Nulls in the particular case of Model #2" – user223304 Sep 12 '19 at 11:39
  • There are no error messages to post. I searched a lot on stackexchange. I'm not a normalization or DB pro as I posted. But others with much more experience have posted things like "Having a column with lots of NULL values often indicates a need for (further) normalization." - which lead me to believe that nulls might hinder normalization. – user223304 Sep 12 '19 at 11:42
  • 1
    Please clarify via edits, not comments. Radio button FKs are a subtyping anti-pattern, which you will find said researching DB/SQL subtyping/inheritance, see the link. PS Whether "it's bad practice to use Nulls" in any particular situation is essentially a matter of (informed) opinion, as you can see from the arguments you found. (But when one only has a hammer everything needs hammering.) PS There's no normalization here. If you're going to (mis)use the term here please define what you mean by it & explain how it applies when you use it. If you can't do that for a term, don't use it. – philipxy Sep 12 '19 at 11:59
  • My faq/duplicate/research comment said "error message or". PS That stuff you read about normalization & subtyping is just people not knowing what normalization is. They might be trying to say something that would be useful if they used terms correctly though. However, there is a lot of ignorace & misconception of the relational model. – philipxy Sep 12 '19 at 12:12
  • I have basic normalization knowledge. It basically is a list of rules to follow that leads to a DB which keeps data integrity better than non-normalized models. Data Integrity means that the data in the DB is as accurate and relevant as possible. Some normalization rules include avoiding repeating the same data, and keeping the data in a field singular (as in don't store an array of numbers in a string field). I have used normalization principles in my project already to guide me out of storing arrays like that and using the Intermediary tables instead. So there ya go.. not a pro bro. – user223304 Sep 12 '19 at 12:15
  • "Normalization to higher NFs replaces a table by projections of it that natural join back." The other meaning of "normalization", normalization to a "1NF", [has no single meaning](https://stackoverflow.com/a/40640962/3404097). "There's no normalization here." "[...] & explain how it applies when you use it" PS Another highly upvoted DB/SQL subtyping/inheritance question is [How do you effectively model inheritance in a database?](https://stackoverflow.com/q/190296/3404097). – philipxy Sep 12 '19 at 12:31
  • "Subtyping" is so alien to the antique design patterns available in RDBMS that I cringe when someone tries to apply it. – Rick James Sep 13 '19 at 23:33
  • @RickJames I can't make sense of your comment. Programmers are types/kinds/sorts of employees, so Programmer is a subtype of supertype Employee & there are idioms for when such a (type/kind/sort of) thing arises. Every FK is from a table where the projection on a key & the FK is a is-a-subtype-of relation(ship)/association. – philipxy Sep 14 '19 at 19:28

2 Answers2

3

Some very good designers use NULLs in foreign keys with no adverse consequences. I lean that way myself. A nullable FK represents an optional relationship. In instances where the entity has no relationship, the FK contains a NULL. The space overhead is minimal. When joins (equijoins, more precisely) are done across the two tables, instances containing NULL in the FK will drop out of the join, and that's appropriate.

Having said that, I'm going to recommend a fourth method to you. This involves a total of 4 tables, accounts, widgets, types, and custom_types. The custom_types table uses a technique called Shared-primary-key, outlined below.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id   INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    type_id     INT UNSIGNED NOT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE,
    FOREIGN KEY (type_id) REFERENCES types(type_id)
    );  
CREATE TABLE types (
    type_id     INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    name        VARCHAR(100) NOT NULL,
    PRIMARY KEY (type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
CREATE TABLE custom_types (
    type_id     INT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    PRIMARY KEY (type_id),
    FOREIGN KEY (type_id) REFERENCES types(type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)

);

The type_id column in custom_types is a shared primary key. Notice that it is declared BOTH as a primary key and as a foreign key, and that it doesn't use autonumber. It's a copy of the primary key in types for the corresponding entry. The custom types table contains all the data that is present in custom types but absent in preset types.

For preset types, an entry is made in types, but no entry is made in custom_types. For custom_types, an entry is first made in types, and then the resulting value of type_id is copied over into custom_types, along with the account_id.

If you INNER JOIN types and custom_types, the preset types drop out of the join. If you want both custom and preset types in a single join, you have to use a LEFT JOIN or a RIGHT JOIN to get that effect. Note that the result of a LEFT or RIGHT JOIN will contain some NULLs, even though those NULLs are not stored in the database.

Clicking on this will give you a more detailed description of the shared primary key technique.

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58
  • Interesting. Good to know technique. Thanks. – Madhur Bhaiya Sep 14 '19 at 17:00
  • This is a great looking technique and probably my solution. Thanks for taking the time to understand the full situation - I know it was a long post. The preset type id numbers not always being clustered together is a bit bothersome to me but it's probably not an actual issue. I'll think on this some more and come back. – user223304 Sep 15 '19 at 16:43
  • The idea of grouping things into a set by assigning them a consecutive span of ids is not the way relational modelers do it. Add an attrubut that will have one value (say 0) for presets and another value (say 1) for customs – Walter Mitty Sep 15 '19 at 18:03
  • Ya it's not an actual issue. Thanks. – user223304 Sep 16 '19 at 12:11
2

get the DBs right for this project the first time

That's almost impossible. It is best to plan on revamping the schema a few months into the project.

Nulls vs FKs

First, decide what columns need an out-of-band "null" value. Only then decide whether a FK is usable, or is more hassle than it is worth.

Most columns will always be present, hence NOT NULL. But some can validly be NULL:

  • A value that is not yet known. Example: end_date
  • An optional value.
  • (and lots more)

Normalization

No normalization is usually bad;
Over-normalization is also usually bad;
The middle ground is hard to find until you have some feel for how many widgets there will be, how much their names vary, etc.

There are perhaps two reasons for normalizing:

  • Data Integrity (sort of). To move a commonly occurring thing (typically the 'name' of something or someone) into a single place, thereby making it easy and efficient to change the name when needed. This helps when Upper Volta changes its name to Burkina Faso, but is useless for the split up of Yugoslavia.
  • Performance. If a long name shows up in dozens of places and in lots of indexes, it takes disk space, I/O time and, to a lesser degree, CPU time.

For a thousand "things", normalization does not matter much.
For a billion "things", normalization is vital.
Again, tuning the middle-ground may not be possible on day-one.

Preset vs custom types

You present them as nearly-identical. But then you move on to suggest that 'preset' types are missing some attribute. ("Missing" == "Null"??)

Think of a "type" as an Entity. And your other tables have a Relation to it. It is probably many-to-many.

Do not "reserve 1500". If you need to distinguish preset vs custom, then add a column that says which is which. Reservations will eventually get you in trouble.

The existence of type.account_id (in the first schema) implies that a "type" is not used by multiple accounts. Yet preset types can be? Ugh! Thumbs down to Model 1.

Separate columns for reset and custom. That smacks of "a widget can have one of each". Thumbs down for Model 2.

Model 3 had multiple tables that smell like "many-to-many" mappings between widgets and each type of type. Did you really want many-to-many?

4 byte type_id

Sure, INT is 4 bytes. But do you really expect a billion or more "types"? Use a smaller datatype. For example MEDIUMINT UNSIGNED is 3 bytes and overflows at a generous 16 million. (Etc)

Starting over...

You have 3(?) Entities: Widgets, Accounts, Types. You need one table for each. (Or at least one primary table -- Example: An order is composed of many order_items.)

Those Entities have 1:1 or 1:many or many:many relationships. Decide which applies where.

  • 1:1 is usually 'wrong', so try to avoid that.
  • 1:many has an id for the "1" is in the "many" table.
  • many:many needs an extra table with a pair of ids.

The 3 Entity tables, plus any many:many Relation tables, make up the core of your schema. (Meanwhile, I don't see where "normalization" figures into your small schema.)

Then add on FOREIGN KEYs where desired. (FKs are not mandatory.)

In other words, I think you have the "cart before the horse" when you ask 'one question it would be "I need to know if it's bad practice to use Nulls"'.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • *Meanwhile, I don't see where "normalization" figures into your small schema* - What do you mean by this point ? Pardon me (because I am still trying to grasp normalization conceptually).. Isn't Model 3 a higher normalized form ? – Madhur Bhaiya Sep 14 '19 at 16:58
  • @MadhurBhaiya & RickJames The design transformations in the "Normalization" section are not normalization. (In either sense--to 1NF or higher.) (But I agree with 'I don't see where "normalization" figures', for your transformation or normalization.) See my comments on the question. – philipxy Sep 14 '19 at 19:32
  • Thanks for the insights and tips. Yes, a custom type is made by a user who has an account. It is not shared with other users. A custom type has a few more attributes that a preset type does not including account_id. Presets can be used by all accounts. A widget has either a custom or preset type but not both. – user223304 Sep 15 '19 at 15:42
  • It sounds like you first suggest combining preset and custom types into one table and nulling out account_id in preset types, and not reserving any extra spots for future presets. That might be doable. But then you say it's prob not a good idea to put presets and customs in the same table because presets are used by all accounts while custom are only used by 1 specific account. I agree we should prob separate the types into different tables. – user223304 Sep 15 '19 at 15:42
  • So if we are putting custom-types and preset-types in 2 different tables then there are probably 4 different Entities in 4 different primary tables (Widgets, Accounts, custom-types, preset-types), not 3. Now it's all about mapping out an Either-Or relationship between a single widget having EITHER a custom OR preset type. – user223304 Sep 15 '19 at 15:42
  • Too much hand-waving. Let's have specifics on what is involved with so-called presets and so-called customs. What attributes are different? Which presets (name them) are sharable, etc. Sometimes a better idea comes when we can see the _real_ application, not a fabricated, simplified, generic one. – Rick James Sep 15 '19 at 18:06