10

I'm trying to understand the best way to handle columns which are mostly empty in terms of disk-space and index-performance. Is there a difference between putting in all empty places NULL vs '' (for varchar / text) vs 0 (for int).

Thanks.

Johanna Larsson
  • 10,531
  • 6
  • 39
  • 50
Noam
  • 3,341
  • 4
  • 35
  • 64
  • [Possibly A Duplicated](http://stackoverflow.com/questions/5015990/mysql-integer-0-vs-null) – thar45 Nov 03 '12 at 09:31
  • @Thanga thanks for the ref it is similar, but I'm interested to hear about disk-space and index building implications. Will clarify – Noam Nov 03 '12 at 09:33
  • 2
    in addition to my answer below, my advice is that this sounds like a case of "premature optimization" This is not really worth your time to worry about. Just design schema that makes sense. I guarantee you that this will not be your biggest bottleneck. – Gavin Towey Nov 03 '12 at 10:13

3 Answers3

18

No, using NULL will not take up less space than an empty VARCHAR or INT field. In fact it may take up more space. Here's why:

A VARCHAR is stored as a size + value. The number of bytes used for the size depends on the max storage of the VARCHAR. VARCHAR(255) requires one byte, VARCHAR(65536) requires two bytes and so on.

So that VARCHAR(255) column takes up one byte even if you store an empty string. The following table would take a minimum of one byte per row (plus some other possible overhead depending on storage engine).

CREATE TABLE sample (
  a VARCHAR(255) NOT NULL
);

To save NULL values, MySQL uses a bitmask for each row. Up to 8 nullable columns can be stored per byte. So if you have a table like this:

CREATE TABLE sample (
  a VARCHAR(255) NULL
);

Would take a minimum of two bytes per row. Storing NULL just sets the bit, it's already reserved whether you use it or not. The byte for the VARCHAR's size is still used for each row even if the column is set to NULL.

James
  • 5,137
  • 5
  • 40
  • 80
Gavin Towey
  • 3,132
  • 15
  • 11
  • Good answer, does this apply to MySQL & ms SQL server? – SteB Nov 03 '12 at 10:13
  • I know this is true for MySQL, but I make no claims to know others -- this answer seems to indicate it's similar: http://stackoverflow.com/questions/3731172/how-much-size-null-value-takes-in-sql-server – Gavin Towey Nov 03 '12 at 10:17
  • @GavinTowey good answer, but I assume the bitmask you mentioned helps MySQL manage the indexes more efficiently? If so, wouldn't that result in less space in cases there's an index on that column? – Noam Nov 03 '12 at 15:19
  • It doesn't. I'm not exactly sure what it does internally, but this quick test shows that NULL in an index doesn't take up any less space. http://pastebin.ca/2247977 – Gavin Towey Nov 03 '12 at 18:33
  • I don't know mechanics in postgres, but result similar - if I replace '', 0 and so on with nulls, database become bigger. – degr Jan 03 '22 at 12:30
1

The simple answer is maybe (though it shouldn't matter), nulls may take up less disk-space, though the space saving will probably be tiny (though even tiny savings will add up).
Unless disk space is very tightly constrained I wouldn't worry about it (disk space is a lot cheaper than programmer time).
Also, null and 0 (or ''), are semantically different, so shouldn't be used interchangeably, certainly not for a theoretical (or very small) performance gain.

See this question for more detail.

I don't think indexing will be greatly affected, there may be a slight speed improvement.
See this question for more detail.

This question deals specifically with MySQL and null performance.

Community
  • 1
  • 1
SteB
  • 1,999
  • 4
  • 32
  • 57
0

It depends.

If you have a fixed-width table (no VARCHAR, VARBINARY, BLOB or TEXT), it will probably make no difference.

In a variable-width table, a NULL will probably consume as much space as an empty VARCHAR.

If you have nearly all values NULL and only very few contain data, you could create a separate table which you join to.

So let's assume you have a list of people where only a few of them you have the birthdate.

So instead of

CREATE TABLE people (id INT UNSIGNED NOT NULL PRIMARY KEY AUTOINCREMENT, name VARCHAR(60) NOT NULL, birthday DATE)

you could do

CREATE TABLE people (id INT UNSIGNED NOT NULL PRIMARY KEY AUTOINCREMENT, name VARCHAR(60) NOT NULL)
CREATE TABLE birthdates (id INT UNSIGNED, birthday DATE NOT NULL)

and query the data with a LEFT JOIN.

If there are applications which need to access the table in the old format, you could define a view.

glglgl
  • 89,107
  • 13
  • 149
  • 217