-1

This is a question about storing nulls (or not), and two answers that do a very good job answering the use of nulls are:

Let's say I want to store a list of movies that are stored on iTunes. For simplicity, we'll just store a few fields so that the film Avatar has these values:

  • ID: 354112018
  • Name: Avatar
  • Year: 2009
  • Synopsis: "From Academy Award®-winning director James Cameron comes Avatar, the story..."

However, sometimes the Synopsis is missing, such as: https://itunes.apple.com/be/movie/id437975371 and sometimes the Year is missing, especially for future or tentative-releases, such as: https://itunes.apple.com/au/movie/id1598491343.

Without giving it a second thought, I would probably create one table to store those four fields, something like this:

  • ID (INT)
  • Name (VARCHAR)
  • Year (INT NULL)
  • Synopsis (VARCHAR NULL)

Is there any advantage in 'further normalizing' the database so that, for example, I don't store any null values, such as:

Title

  • TitleID
  • Name

TitleSynopsis

  • TitleID
  • Synopsis

TitleYear

  • TitleID
  • Year

To me it seems like doing this would potentially create hundreds of extra tables (on a large database) and make inserts a nightmare -- I suppose a View could be created to flatten out the results so it's queryable, but even though I feel like it would require so much overhead. So is there any reason in the above case to normalize to remove nulls, or in general, what would be the case to do so, if there ever is one?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 2
    Splitting a single table into multilple tables like this is often called "vertical partitioning" so if you search on that you might find some use cases for and against. You would need to outer join to all of these tables which yields a NULL anyway. Personally this looks like it's overcomplicating things. If you had zillions of fields and lots were NULL, you could use various "sparse" implementations in various databases. – Nick.Mc Jan 12 '22 at 03:17
  • 2
    What exactly is the question? Such rearrangements between variations of designs involving subyping are not normalization & do not address the problems normalization addresses. You don't actually clearly say what you mean by 'further normalizing' (or "flatten out" or "queryable") so that is going to impede your researching it. (Putting words in scare quotes does not clarify the idiosyncratic meaning that you don't make clear by actually saying what you mean.) – philipxy Jan 12 '22 at 05:02
  • When clear & specific this will be a duplicate of tons of questions about using nulls vs not, where also the null-free straightforward relational design is addressed, and also tons of questions about representing SQL/DB designs involving subtypes. Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect your research. See [ask], [Help] & the voting arrow mouseover texts. – philipxy Jan 12 '22 at 05:12
  • @philipxy what do you mean, where is the subtype? It's a single movie with no other types. – David542 Jan 12 '22 at 05:17
  • 2
    Possible duplicate of [Are nulls in a relational database okay?](https://stackoverflow.com/q/163434/3404097) – philipxy Jan 15 '22 at 07:37
  • There's no such thing as "best" in engineering unless *you* define it. And then, how are you stuck deciding? Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design & all alternatives you can think of have problems (whatever that means at the time), then ask 1 specific researched non-duplicate question. [Strategy for “Which is better” questions](https://meta.stackexchange.com/q/204461) – philipxy Jan 15 '22 at 07:37
  • Typically people think of subtyping when there are groups of nullable columns that are either all there/non-null or all absent/null. So a single absent/null column is a special case. (Hence the types of movies with & without a synopsis, etc.) There are subtypes (whether explicitly noticed or not) when sometimes we are interested in table unions (including full natural join) or partitions (including subsets & selections, including presence of a FK). Normalization to higher NFs replaces a base table by projections/components of it that (inner) natural join back to it. – philipxy Jan 15 '22 at 07:38
  • 1
    Dozens of published academic information modeling & DB design textbooks are online free in pdf & html. Read many. The most cogent information modelling method is Halpin's ORM2 (Object-Role Modelling), child of Nijssen's NIAM "fact based" method. ORM2's main book IM & relational DBs (2 editions) have chapters mapping it to other main methods (ER & pseudo- pre- ER. Presentations of the relational model per se are generally poor. Read Darwen's free intro to relational DB theory. http://stackoverflow.com/a/27272088/3404097 http://stackoverflow.com/a/34427859/3404097 – philipxy Jan 15 '22 at 07:59
  • 1
    Possibled duplicate of [When to use NULL in MySQL tables](https://stackoverflow.com/q/471367/3404097) Etc etc. – philipxy Jan 17 '22 at 11:17

2 Answers2

0

There is nothing wrong with having NULL values stored. In your case, you can think of it as an OR statement. You don't write several IFs. You can write one IF with some OR conditions.

You presented another solution, which is creating more tables so you would avoid NULL values. You won't consume less disk/memory space, because attached to the attribute you would have the TitleID attribute. Querying would be harder. You would have to create more joins, and your query would take more processing time (which is usually the bottleneck for big applications) and you wouldn't have a single advantage towards this approach. You waste more disk space, more memory, more processing time, and your query will be bigger, uglier and harder to understand.

So, all things considered, have all these attributes in one table, and do not fear NULL values. Actually, they will serve you quite well! Imagine if you want to query "Movies without synopsis added" or "Movies without Year added" so you could ask the viewers for their help inputing the information. NULL can be your friend in these situations. Hope this explanation helps :)

Aleksandrus
  • 1,589
  • 2
  • 19
  • 31
-1

Whether to normalize your data, or use null, can be considered separate questions, which do not affect each other.

  • Reasons for normalization can be to avoid redundancy (duplicate data), ensure only related information is in a particular table, simplify queries or avoid data modification issues.

  • Whether to store NULL values or not, is more a matter of whether it makes sense for a value to be NULL in your data, rather than being a default value. For example, if it would be more correct to specify that data does not exist, rather that provide some default value.

Is there any advantage in 'further normalizing' the database so that, for example, I don't store any null values, such as....:

There are different forms of normalization such as 1NF,2NF,3NF.. etc. The purpose of each level of normalization is to achieve some benefit, and avoid common problems to maintaining and querying data.

Depending on your use case, it may make sense to normalize further, or it may make sense to invest your development time elsewhere. There is no right or wrong answer - In software engineering in general there are always tradeoffs to how one invests their resources: Improve the quality of one feature, vs investing in several etc Each investment in time would have rewards best understood only by the persons building and maintaining the product.

A common way that teams move forward is to let their immediate and near term business needs define how they invest their development resources to get some tangible benefit immediately, but you should decide what is best for you, and your team, depending on your situation, and what you hope to achieve in the time frame that matters to you.

Rahul Iyer
  • 19,924
  • 21
  • 96
  • 190