8

Pretext, I am familiar with the semantical differences between a NULL value and an empty string.

I have a MySQL table where I store a lot of hostnames with their IP addresses (as a string) and wonder what would appear to be more natural (or efficient storage-wise) in case a hostname cannot be resolved.

A NULL value or an empty string (in which case it probably should be a VARCHAR and not a CHAR)

I would tend towards a NULL value but I would like to have this confirmed or disconfirmed.

user2352129
  • 83
  • 1
  • 5
  • 1
    This is kind of subjective, but NULL carries the meaning of "not applicable" which seems the most natural here. Efficiency shouldn't be a factor. – Ja͢ck May 05 '13 at 15:05
  • @Jack Well, efficiency is always a factor ;) – user2352129 May 05 '13 at 15:11
  • Do hostnames need to be unique? Do you ever expect multiple IP addresses to resolve to the same hostname? – Branko Dimitrijevic May 05 '13 at 15:16
  • also, what is your engine? – Sebas May 05 '13 at 15:20
  • @BrankoDimitrijevic No, no uniqueness. Also it is always host->ip and never ip->host – user2352129 May 05 '13 at 15:21
  • 1
    @user2352129 I meant efficiency shouldn't be a factor in this particular case of course. – Ja͢ck May 05 '13 at 15:21
  • @Sebas Myisam, basically nothing transactional, rather a sort of history – user2352129 May 05 '13 at 15:21
  • I'm with Jack; efficiency is a good thing to aim for but not at the expense of other good practices because efficiency of developer time is also important. I've not got enough detail to know whether this is a micro-optimisation or whether it will actually make a noticable difference, but either way I would guess you could make other savings elsewhere that would make more difference. Finally, remember that efficiency of storage space will often have a different answer to efficiency of query speed; you've asked for storage efficiency, but storage is cheap; speed is usually more important. – Spudley May 05 '13 at 19:36

4 Answers4

6

In MyISAM MYSQL you save one bit per row not using NULL. As it is stated here:

Declaring columns NULL can reduce the maximum number of columns permitted. For MyISAM tables, NULL columns require additional space in the row to record whether their values are NULL. Each NULL column takes one bit extra, rounded up to the nearest byte.

Take a look here as well:

In addition, while a NULL itself does not require any storage space, NDBCLUSTER reserves 4 bytes per row if the table definition contains any columns defined as NULL, up to 32 NULL columns. (If a MySQL Cluster table is defined with more than 32 NULL columns up to 64 NULL columns, then 8 bytes per row is reserved.)

Moreover it also makes the database work faster at it stated here (taken from stackoverflow - @DavidWinterbottom link didn't work for me, I added a different sourse)

It's harder for MySQL to optimize queries that refer to nullable coumns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size inded (such as an index on a single integer column) to be converted to a variable-sized one in MyISAM.

In most of the cases non-NULL values behave more predictable when combined with COUNT() and other aggregating function but you can also see a NULL behave according to your needs.

As it is stated here, not all group (aggregate) functions ignore NULL for instance, COUNT() would give you different result that COUNT(*) for a column containing NULL values.

On the other hand as other point out NULL better reflects the meaning of entry - it is an unknown value and if you wanted to count all the hosts you would probably COUNT() to behave exactly as it does.

Community
  • 1
  • 1
Legat
  • 1,379
  • 3
  • 11
  • 20
  • The null behaviour has never been a problem for me. Would you be so kind to post a link to your respectable argument about the bit size of the NULL value? On the other hand, the "it also makes the database work faster" could use some further demonstrations. (bigger does not always mean slower) – Sebas May 05 '13 at 15:05
  • @Legat Would I actually save this bit? I assume it would be part of some internal flag collection, hence it would be always there and take space, just not evaluated in case NULL is not allowed. – user2352129 May 05 '13 at 15:10
  • Thank you for mentionning your source. I see it clearly talks about MYISAM, which means the innodb engine is exempted. – Sebas May 05 '13 at 15:15
  • @Sebas Of course, even though database speed is related to row size, saving one per row won't do miracles. Still NULL being a special value might have noticeable influence on the processing time. I'll look for some sources examining how big this influence can be. – Legat May 05 '13 at 15:21
  • @user2352129 Yes you would, however it's still just one bit per row. Not too much but I usually go for it where I can. – Legat May 05 '13 at 15:22
  • @Sebas You are right, it's about MYISAM but are you sure innodb engine is not saving this one bit on not null columns? – Legat May 05 '13 at 15:26
  • @Legat Thanks, its not clear to me though which way you are suggesting, NULL or empty string. As to size, it would also take at least a length byte with an empty VARCHAR, wouldnt it? – user2352129 May 05 '13 at 15:27
  • 1
    @user2352129 In a long run I would go for VARCHAR, as it would save time and space. However, NULL better fits the meaning of your entries and in some cases it can make queries shorter, so I might use it out of laziness. If you don't need top performance, choose whichever you like more ;) Take a look at this question as well: [When to use NULL in MySQL tables](http://stackoverflow.com/questions/471367/when-to-use-null-in-mysql-tables) – Legat May 05 '13 at 15:48
  • @Legat Unfortunately I did not mention it initially but only later, but I wouldnt select based on the column. So it probably really only comes down to storage (I would assume NULL has a slight advantage here) and how natural it feels. So from what I understood you would go for non-NULL, correct? – user2352129 May 05 '13 at 15:53
  • @user2352129 If you are not planning to (excessively) use this column there is no real difference in terms of storage and performance. In such a case I would go for what's easier to implement :) – Legat May 05 '13 at 15:57
  • @Legat: the only time I've ever seen NULL affect index performance is with respect to OUTER JOINs (where NULL could mean "no matching row" or the value, NULL). Otherwise, it's very valuable -- zero often means something different (i.e. not a special value); i.e. when storing how many children you have. "No children" is different than "I didn't answer the question". – fenway May 05 '13 at 16:39
3

First: Consider closely the different semantics of NULL and Empty-String.

  • The first is best interpreted as something like:
    There is a valid value for this field, but that value is not yet known.
  • The second always means:
    The valid value for this field is known, and it is precisely "".

Second: Recognize that indexing and filtering works better and more efficiently on Empty-String than on NULL, so don't use the latter when you really mean the former.

Third: Recognize that all expressions that use NULL are susceptible to the non-intuitiveness of three-valued logic unless the NULL is religiously coalesced to Empty-String (or some other contextually valid value) first. In particular, the law of excluded middle no longer applies, so the expression A or ~A is no longer tautologically true whenever the evaluation of A requires evaluation of a NULL term. Forgetting this can lead to very subtle and hard-to-locate bugs.

The not-equals operator exposes this regularly:

When A has the value NULL:
   The expression A = 0 returns false; 
   The expression A <> 0 returns false; and
   The expression A OR NOT A returns false!

Update:
I guess the essence of my point is that they are NOT the same creature, but rather very different beasts. Each has its place. A second address field should always be non-null (unless you intend to allow entry of partial or incomplete addresses), and it's default should always be the valid and known value of Empty-String. NULL should be restricted to cases where a valid and known value will be supplied later, and in fact o signal some sort of validation failure that must be resolved.

From OP below:

A row will not be updated. On the insertion there is either an IP address or there is none (because it could not be resolved).

Response:

Then I recommend using Empty-String as the default, and make the field NON-NULL. Only use NULL when you must, as it has subtle disadvantages.

Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52
  • I got you up until the last paragraph. In my case one could argue both solutions (NULL, the host could not be resolved and is unknown - "", the host could not be resolved but an attempt was made so the result is empty). ........ From your response I did not quite get which solution you would opt for, could you elaborate on this? – user2352129 May 05 '13 at 15:50
  • @user2352129: I guess the essence of my point is that they are **NOT** the same creature, but rather very different beasts. Each has its place. A second address field should always be non-null (unless you _intend_ to allow entry of partial or incomplete addresses), and it's default should always be the _valid and known_ value of **Empty-String**. **NULL** should be restricted to cases where a _valid and known_ value will be supplied later, and in fact o signal some sort of validation failure that must be resolved. ... – Pieter Geerkens May 05 '13 at 15:54
  • @user2352129: In your example I cannot see where you have defined the semantics of this field sufficiently to make a determination. I suspect that you should make the field nullable, but define it's default value as **Empty-Sting**. That way the application can assign **NULL** explicitly to mean "It exists, but I need to go find the value"; but the default of **Empty-String** meaning "Doesn't have one" is placed on initial record creation. Is that the correct semantics of your table row? – Pieter Geerkens May 05 '13 at 15:55
  • 1
    As I said I am aware of the difference. However in this case there is no known or valid value. If a host does not resolve I dont have an IP, and the question is whether this should be represented with a NULL value or with an empty string. Not only in terms of storage but also in terms of how natural either way is. – user2352129 May 05 '13 at 16:00
  • 1
    A row will not be updated. On the inseration there is either an IP address or there is none (because it could not be resolved). – user2352129 May 05 '13 at 16:01
  • @user2352129: Then I recommend using **Empty-String** as the default, and make the field NON-NULL. Only use **NULL** when you must. – Pieter Geerkens May 05 '13 at 16:02
  • Thanks too, I also want to apologise for not choosing your answer as the best one, but all three given answers made very good points, and I could only choose one so I decided to go with the user with the lowest number of points. I hope you understand this, thanks again :) – user2352129 May 05 '13 at 16:19
  • @user2352129: Apology unnecessary; but thank you for the explanation. SO asks that you accept whichever answer was either most useful to **you**, or which you believe will be most useful to **a future visitor** to this page. – Pieter Geerkens May 05 '13 at 16:59
1

Oracle solved the problem and interpretes both the same.

Mysql does not, I'm not judging it but personnally I don't like it, therefore using NULL as much as I can to "standardise" my code.

Plus, from the point of view of the significance of the keywords, NULL is exactly what you want because it means "unknown" in db semantics. (correct me if I'm wrong)

Sebas
  • 21,192
  • 9
  • 55
  • 109
  • 2
    As to semantics I agree, thats why I'd also rather tend to NULL but I wonder whether an empty string could have a potential advantage in performance or storage size (IIRC NULL requires an additional flag). ...... As to Oracle, I wasnt aware of this and only found out in the process of my research but I dont really agree with them. An empty string does not always have necessarily the same meaning as NULL and hence shouldnt not be automagically converted – user2352129 May 05 '13 at 15:06
  • yes this is right about the difference of meaning. But after a while you realise this is much clearer to avoid the double notation. But this is a subjective point of view I suppose. – Sebas May 05 '13 at 15:07
  • From a performance or storage point of view you personally wouldnt prefer either over the other though? – user2352129 May 05 '13 at 15:12
  • From storage point of view no, definitely. I'm convinced this is irrelevant now that the cheapest cloud plateform offers dozens of GB. The performance point of view still has to be demonstrated. If it is clear that selecting a null value is n times slower than an empty string, I'll reconsider my position. – Sebas May 05 '13 at 15:14
  • to be fair, I should have noted that I will NOT select based on this field – user2352129 May 05 '13 at 15:17
  • so, pickup the sexiest for you then :-) (esthaetically) – Sebas May 05 '13 at 15:21
  • ;) Thats exactly where I was looking for advice, which one would have from our technically perspective the most sex-appeal .... translation, where does the gut-feeling draw us to? ;) – user2352129 May 05 '13 at 15:29
  • :) I love the immediate readability of IS (NOT) NULL, as well as the way phpmyadmin or mysql console shows the nulls in the recordsets. This is just my favorite display! Since there's no noiticable counterpart (at least for me) I guess it remains a subjective point of view... – Sebas May 05 '13 at 15:35
  • I agree with the IS NULL part. I assume I will nullable the column then ;) thanks – user2352129 May 05 '13 at 15:37
  • Thanks again, I would like to apologise for not having chosen your answer though, all three given answers were very good, but as I can only choose one I decided to give the points to the user with the lowest number of points, I hope you understand. Thanks again – user2352129 May 05 '13 at 16:18
  • that's ok! others elaborated their answer much more, it makes more sense. – Sebas May 05 '13 at 16:47
0

I recommend you to use NULL, while the type of NULL is different from string. It's easier for example to filter out rows with this value or detect type of value in that field and so on.

jjurm
  • 72
  • 5