59

Is it better to use default null or default "" for text fields in MySQL?

Why?

Update: I know what means each of them. I am interested what is better to use considering disk space and performance.

Update 2: Hey ppl! The question was "what is better to use" not "what each means" or "how to check them"...

Ion Br.
  • 2,598
  • 1
  • 19
  • 25
  • 2
    As with most "what's better for disk space and performance" questions: why don't you just insert a million rows with NULL, test some queries, and check for disk space? Repeat with ""s, and once more with a relatively even mix. And the answer is much more reliable than what some random guy on SO says ;) – ojrac Jul 10 '09 at 03:35
  • 2
    haha, I love the update comments – Xeoncross Jan 01 '10 at 20:16

12 Answers12

64

For MyISAM tables, NULL creates an extra bit for each NULLABLE column (the null bit) for each row. If the column is not NULLABLE, the extra bit of information is never needed. However, that is padded out to 8 bit bytes so you always gain 1 + mod 8 bytes for the count of NULLABLE columns. 1

Text columns are a little different from other datatypes. First, for "" the table entry holds the two byte length of the string followed by the bytes of the string and is a variant length structure. In the case of NULL, there's no need for the length information but it's included anyways as part of the column structure.

In InnoDB, NULLS take no space: They simply don't exist in the data set. The same is true for the empty string as the data offsets don't exist either. The only difference is that the NULLs will have the NULL bit set while the empty strings won't. 2

When the data is actually laid out on disk, NULL and '' take up EXACTLY THE SAME SPACE in both data types. However, when the value is searched, checking for NULL is slightly faster then checking for '' as you don't have to consider the data length in your calculations: you only check the null bit.

As a result of the NULL and '' space differences, NULL and '' have NO SIZE IMPACT unless the column is specified to be NULLable or not. If the column is NOT NULL, only in MyISAM tables will you see any peformance difference (and then, obviously, default NULL can't be used so it's a moot question).

The real question then boils down to the application interpretation of "no value set here" columns. If the "" is a valid value meaning "the user entered nothing here" or somesuch, then default NULL is preferable as you want to distinguish between NULL and "" when a record is entered that has no data in it.

Generally though, default is really only useful for refactoring a database, when new values need to come into effect on old data. In that case, again, the choice depends upon how the application data is interpreted. For some old data, NULL is perfectly appropriate and the best fit (the column didn't exist before so it has NULL value now!). For others, "" is more appropriate (often when the queries use SELECT * and NULL causes crash problems).

In ULTRA-GENERAL TERMS (and from a philosophical standpoint) default NULL for NULLABLE columns is preferred as it gives the best semantic interpretation of "No Value Specified".

1 [http://forge.mysql.com/wiki/MySQL_Internals_MyISAM]

2 [http://forge.mysql.com/wiki/MySQL_Internals_InnoDB]

James
  • 3,852
  • 2
  • 19
  • 14
48

Use default null. In SQL, null is very different from the empty string (""). The empty string specifically means that the value was set to be empty; null means that the value was not set, or was set to null. Different meanings, you see.

The different meanings and their different usages are why it's important to use each of them as appropriate; the amount of space potentially saved by using default null as opposed to default "" is so small that it approaches negligibility; however, the potential value of using the proper defaults as convention dictates is quite high.

Paul Sonier
  • 38,903
  • 3
  • 77
  • 117
  • 2
    Yes, that way you can treat NO VALUE differently than a possible valid "empty value" – AndyMcKenna Jul 09 '09 at 20:31
  • 1
    Refer to http://stackoverflow.com/questions/1034925/is-an-overuse-of-nullable-columns-in-a-database-a-code-smell , which has some great discussions on the matter. – hythlodayr Jul 09 '09 at 20:37
  • 16
    In MySQL, which the OP is talking about the manual recommends NOT NULL columns. "Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column." http://dev.mysql.com/doc/refman/5.5/en/data-size.html – Greg K Sep 05 '13 at 13:04
27

From High Performance MySQL, 3rd Edition

Avoid NULL if possible. A lot of tables include nullable columns even when the application does not need to store NULL (the absence of a value), merely because it’s the default. It’s usually best to specify columns as NOT NULL unless you intend to store NULL in them. It’s harder for MySQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size index (such as an index on a single integer column) to be converted to a variable-sized one in MyISAM. The performance improvement from changing NULL columns to NOT NULL is usually small, so don’t make it a priority to find and change them on an existing schema unless you know they are causing problems. However, if you’re planning to index columns, avoid making them nullable if possible. There are exceptions, of course. For example, it’s worth mentioning that InnoDB stores NULL with a single bit, so it can be pretty space-efficient for sparsely populated data. This doesn’t apply to MyISAM, though.

Grygoriy Gonchar
  • 3,898
  • 1
  • 24
  • 16
  • 4
    Does converting all nulls in a string column to empty strings and making the column NOT NULL *actually* improve performance to any measurable extent? The idea that one should avoid nullable columns for performance reasons is one that I've never heard until now, and I'm immediately suspicious of it. – Mark Amery Jan 25 '13 at 13:56
  • > The performance improvement from changing NULL columns to NOT NULL is usually small – Grygoriy Gonchar Feb 06 '13 at 12:59
  • 2
    Exact numbers I think will very depend on your engine, column type, column/index size, row count etc. So you should not take care of this until you will get actual performance problem with querying of some column. – Grygoriy Gonchar Feb 06 '13 at 13:17
15

I found out that NULL vs "" is insignificant in terms of disk-space and performance.

The only true reason I can personally see in using NULL over '' is when you have a field marked as UNIQUE but need the ability to allow multiple "empty" columns.

For example, the email column in my user table is only filled in if someone actually has an email address. Anyone without an email address gets NULL. I can still make this field unique because NULL isn't counted as a value, whereas the empty string '' is.

degenerate
  • 1,224
  • 1
  • 14
  • 35
8

A lot of folks are answering the what is the difference between null and '', but the OP has requested what takes up less space/is faster, so here's my stab at it:

The answer is that it depends. If your field is a char(10), it will always take 10 bytes if not set to null, and therefore, null will take up less space. Minute on a row-by-row basis, but over millions and millions of rows, this could add up. I believe even a varchar(10) will store one byte (\0) as an empty string, so again this could add up over huge tables.

In terms of performance in queries, null is in theory quicker to test, but I haven't seen able to come up with any appreciable difference on a well indexed table. Keep in mind though, that you may have to convert null to '' on the application side if this is the desired return. Again, row-by-row, the difference is minute, but it could potentially add up.

All in all it's a micro-optimization, so it boils down to preference. My preference is to use null because I like to know that there's no value there, and not guess if it's a blank string ('') or a bunch of spaces (' '). null is explicit in its nature. '' is not. Therefore, I go with null because I'm an explicit kind of guy.

Eric
  • 92,005
  • 12
  • 114
  • 115
3

Use whatever makes sense. NULL means "no value available/specified", "" means "empty string."

If you don't allow empty strings, but the user does not have to enter a value, then NULL makes sense. If you require a value, but it can be empty, NOT NULL and a value of "" makes sense.

And, of course, if you don't require a value, but an empty value can be specified, then NULL makes sense.

Looking at an efficiency point of view, an extra bit is used to determine whether the field is NULL or not, but don't bother about such micro-optimization until you have millions of rows.

Blixt
  • 49,547
  • 13
  • 120
  • 153
2

"" is like an empty box... null is like no box at all.

It's a difficult concept to grasp initially, but as the answers here plainly state - there is a big difference.

Sampson
  • 265,109
  • 74
  • 539
  • 565
2

I prefer null when it is semantically correct. If there is an address field available and the user did not fill in, I give it a "". However if there in an address attribute to in the users table yet I did not offer the user a chance to fill it in, I give it a NULL.

I doubt (but I can't verify) that NULL and "" makes much of a difference.

KahWee Teng
  • 13,658
  • 3
  • 21
  • 21
1

'' = '' yields TRUE which satisfies WHERE condition

NULL = NULL yields NULL which doesn't satisfy WHERE condition

Which is better to use depends on what result you want to get.

If your values default to NULL, no query like this:

SELECT  *
FROM    mytable
WHERE   col1 = ?

will ever return these values, even if you pass the NULL for the bound parameter, while this query:

SELECT  *
FROM    mytable
WHERE   col1 = ''

will return you the rows that you set to an empty string.

This is true for MySQL, but not for Oracle, which does not distinguish between empty string and a NULL.

In Oracle, the latter query will never return anything.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
1

Use "". It requires less programming effort if you can assert that columns are non-null. Space difference between these is trivial.

clemahieu
  • 1,419
  • 9
  • 9
0

In general, NULL should indicate data that is not present or has not been supplied, and therefore is a better default value than the empty string.

Sometimes the empty string is what you need as a data value, but it should almost never be a default value.

davidcl
  • 1,187
  • 8
  • 23
0

NULL means 'there is no value' and is treated especially by RDBMSs regarding where clauses and joins.

"" means 'empty string' and is not treated especially.

It depends on what does the text represent and how will it actually be used in queries.

For example, you can have a questionnaire with some obligatory questions and some optional questions.

  • Declined optional questions should have a NULL in their corresponding column.
  • Obligatory questions should have an empty string as default, because they HAVE to be answered. (Of course in a real application you'd tell the user to enter something, but I hope you get the idea)
Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373