1

I'm new in database design and as far as I could research, using null values as representation of no present data is not a good Idea, the proble that I'm having now is that I don't know how to represent no present data instead of null.

For example I have a user table and a FavoritColor table, the user has a column called FC which is a key for the ID of the FavoritColor table. Now I'm giving the user the possibility to not choose a color, how would I note that, what would the FC column be filled with if the user chose to not choose a color?

Elzo Valugi
  • 27,240
  • 15
  • 95
  • 114
Mercury
  • 594
  • 1
  • 12
  • 28

6 Answers6

2

NULL is "THE" representation of no data in SQL so I would suggest you to use it that way.

And you can always insert NULL to a column, there is no reason not to. If you really do not want to use NULL for a reason I can not understand, I think the (next) most logical representation of "no data" would be an empty string ('')

But again, I would stick to the standard... NULL.

Flip Vernooij
  • 889
  • 6
  • 15
  • Just to be pedantic, you cannot 'always insert NULL to a column', e.g you cannot insert `NULL` into a column with the `NOT NULL` attribute. – Arth Apr 26 '16 at 13:27
2

NULL is a perfectly fine value to use. If you're worried about defaults, make sure to use an OUTER JOIN and (for SQL Server, anyway) you can do something like:

SELECT user_table.name, COALESCE(preferences.color_preference, 'DEFAULT_VALUE') FROM user_table LEFT OUTER JOIN preferences ON user_table.id = preferences.id;

This type of query will allow you to set a default and store NULL as the preferred color, you will get the default in this case if the color_preference is NULL or if there is no row in the preferences table.

You say "what would the FC column be filled with if the user chose to not choose a color?". I ask, why do you care? They either have a favorite color specified or do not. Do you care if they had a choice of specifying a favorite color but did not tell you?

mikeb
  • 10,578
  • 7
  • 62
  • 120
2

NULL is usually meant to represent "unknown", which is why 'Ellen' <> NULL doesn't result in TRUE, but in NULL. It is not known whether the value still unknown to us is 'Ellen' or not. An example would be a middle name; as long as the field is empty we don't know whether Mary's middle name is Ellen or not.

Often, however, we use NULL also for "not applicable" like the recommended retail price in a product table - some products simply don't have any. So we do know there is no RRP, it's not "unknown", but we use NULL still. What else could we do? Use 0 instead - and then mistakenly show in our online shop that the recommended retail price is zero dollars? Or add a flag has_rrp? Two columns for one content? NULL is often the simpler solution.

And then, we can use NULL to mean "no value". Say an image in a user table. Some users just don't have a photo in our database, so the value remains empty. There is no other value then NULL for empty for binary data. We cannot put a zero in there or so, because the column is supposed to contain image data, say jpeg data or the like.

There are some other ways to represent "no data" for a single field. In the image example we could add a user_image table with a 1:1 relation and either the record is there or not. For a string we'd use '' and for numbers we can sometime - not always - use zero. For dates, like in a price table containing past and future prices with a from_date and to_date, people sometimes put in extreme dates (0001-01-01, 9999-12-31) to avoid complicate queries.

For IDs as in your case we have one more option: have an ID for "no value" and a corresponding entry in the other table. As long as we don't want any special treatment for "no value", this is a fine solution. In your example you could show a combobox with 'black', 'red', 'blue', ..., and 'no color' in your GUI and you can select 'no color' just as easily as you can select 'blue'. But if you ever want special treatment, then you will have queries with and color_id <> (select id from colors where value = 'no color') or the like, which can be annoying.

By the way, sometimes people use NULL even for "all values". Let's say you have a table with product prices per shop. Make shop_id NULL and you have a default price for all shops, fill in a shop_id then you have a shop-specific price.

NULLs often need special treatment, as with IS instead of =, outer joins, and constructs like COALESCE(color, 'no color') etc. This is neither good nor bad per se. If you want to count distinct favorite colors in your users table, it is good. NULL for "no favorite color" just won't be counted with COUNT(DISTINCT color), you will only count favorite colors.

After all it's a decision to make. Do you need the distinction between "not known yet" and "known that no value applies"? Do you want to treat "no color" other than "red"? NULL for "no value" is an option and it is often used such. Decide whether it is good in your case. There is no rule saying NULLmust never be used to represent "no value".

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
1

Suggestions:

1) Remove the FC column from the users table and add a relationship table to model the relationship between a user and their favorite color, say UsersFavoritColor. Now apply the closed world assumption by assuming a user that does not appear in the UsersFavoritColor table in fact has no favorite color.

2) Add another table UsersWhoHaveNoFavoritColor with just the person identifier, ensuring you can’t have the same person in both tables. Add further tables if you need to model further relationships e.g. UsersWhoAreKnownToHaveNoFavoritColor, UsersWhoAreKnownToHaveAFavoritColorButRefusedToTellUs, etc

3) Use a suitable default to mean ‘no color’, if one is available.

onedaywhen
  • 55,269
  • 12
  • 100
  • 138
  • 1
    Favorite Color appears in the User table, not the other way round. Your second suggestion seems way over-engineered and ensuring a person only appears in one of these tables requires application logic or triggers. – Arth Apr 26 '16 at 13:36
  • 1
    @Arth: I misunderstood the design, now fixed. The degree of 'engineering' required relates to the deficiencies in the DBMS of choice e.g. when `CREATE ASSERTION` is supported then no 'engineering' is required. – onedaywhen Apr 26 '16 at 13:55
  • 1
    I think your assertion #2 is way over-engineered. Adding a table for each case is crazy, IMO – mikeb Apr 26 '16 at 15:43
  • Well, if we were on Mars we'd all die of oxygen deprivation, but I don't carry around an oxygen tank. We could what if all day long and wind up with the perfect solution for everything, but at some point you have to get it to work, right? – mikeb Apr 27 '16 at 12:12
1

OP finds "... using null values as representation of no present data is not a good idea, ..."

(This may be opinion based.)

I totally disagree with that assertion. NULL values have some significant advantages. It is a very good idea to use the built-in native support that SQL provides for NULL values.

Typically, the NULL value is described as representing an "unknown" value.

Yes, there are some quirks in dealing with NULL value you need to be aware of. An equality or inequality comparison with a NULL value returns NULL, not TRUE or FALSE. We just need to be aware that in SQL, boolean logic is tri-valued, and NULL will never equal TRUE and never equal FALSE.

These quirks are actually what make NULL value so very powerful.


If you want to use some mechanism other than NULL to represent "no data", that would require either...

storing a non-NULL "special" value that represents a "no-data" condition, and additional logic to check and operate on the "special" value.

storing a non-NULL "no data" indicator/flag in a second column, and additional logic to handle populating, checking and operating on that second column, in conjunction with the first.

And then you'll need to also come up with dummy "no data" rows in tables referenced by foreign keys, so the non-NULL "no data" values in your foreign key columns have a row to reference. Which is what I am thinking you were asking in your question...

If you want to declare your foreign key columns NOT NULL, and have a default value when no value is supplied, that default value will need to reference a row in the referenced table.

As a demonstration of this type of approach... NOTE: I am NOT advocating this as an appropriate design... as just a demonstration of "working around" using a NULL value to represent "no data".

 create table favcolor 
 ( id            int not null primary key
 , favcolor_name varchar(30) not null
 );

 create table user
 ( id            int not null primary key
 , favcolor_id   int not null default '1' comment 'fk, ref favcolor.id'
 , constraint FK_user_favcolor foreign key (favcolor_id) references favcolor(id)
 );

 insert into favcolor (id, favcolor_name) values (1, 'not chosen');

Once you get all of that done, you are going to discover the same thing that SQL Designers of old did. It would be very beneficial to have this kind of handling of "unknown", "no data", "unchosen" as a native built-in function of SQL. Which is why the NULL value was introduced in the first place, as a solution to this very type of problem.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • "An equality or inequality comparison with a NULL value returns NULL, not TRUE or FALSE" - actually, such an operation involving `NULL` evaluates to `UNKNOWN`. Sorry to point that out, just one of the many 'quirks' you need to be aware of :) – onedaywhen Apr 26 '16 at 14:07
  • @onedaywhen: In MySQL,**`SELECT 1 = NULL`** returns **`NULL`**. Try it. – spencer7593 Apr 26 '16 at 14:11
  • I thought you were talking about the "[SQL Designers of old](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt)" who defined a predicate as a "condition that can be evaluated to give a truth value of true, false, or unknown." – onedaywhen Apr 26 '16 at 14:20
0

This is a conceptual problem and is detailed in a lot of places including on SO. It depends on how business perceives your data.

Lets say you have a question on your questionnaire or a user preference for a favorite color associated with a user. Your database table contains user_id and the hex for a certain color. Database designers use NULL as a value to show that the user has not expressed yet any thought on its favorite color. Let's say that the user chooses a color and then has a reset button. In this case you may want to have an empty string instead of a NULL value, by this marking the difference between not having at all a preference and having removed an existing preference.

Elzo Valugi
  • 27,240
  • 15
  • 95
  • 114