I'm trying to understand the best way to handle columns which are mostly empty in terms of disk-space and index-performance. Is there a difference between putting in all empty places NULL vs '' (for varchar / text) vs 0 (for int).
Thanks.
I'm trying to understand the best way to handle columns which are mostly empty in terms of disk-space and index-performance. Is there a difference between putting in all empty places NULL vs '' (for varchar / text) vs 0 (for int).
Thanks.
No, using NULL will not take up less space than an empty VARCHAR
or INT
field. In fact it may take up more space. Here's why:
A VARCHAR
is stored as a size + value. The number of bytes used for the size depends on the max storage of the VARCHAR
. VARCHAR(255)
requires one byte, VARCHAR(65536)
requires two bytes and so on.
So that VARCHAR(255)
column takes up one byte even if you store an empty string. The following table would take a minimum of one byte per row (plus some other possible overhead depending on storage engine).
CREATE TABLE sample (
a VARCHAR(255) NOT NULL
);
To save NULL
values, MySQL uses a bitmask for each row. Up to 8 nullable columns can be stored per byte. So if you have a table like this:
CREATE TABLE sample (
a VARCHAR(255) NULL
);
Would take a minimum of two bytes per row. Storing NULL
just sets the bit, it's already reserved whether you use it or not. The byte for the VARCHAR
's size is still used for each row even if the column is set to NULL
.
The simple answer is maybe (though it shouldn't matter), nulls may take up less disk-space, though the space saving will probably be tiny (though even tiny savings will add up).
Unless disk space is very tightly constrained I wouldn't worry about it (disk space is a lot cheaper than programmer time).
Also, null and 0 (or ''), are semantically different, so shouldn't be used interchangeably, certainly not for a theoretical (or very small) performance gain.
See this question for more detail.
I don't think indexing will be greatly affected, there may be a slight speed improvement.
See this question for more detail.
This question deals specifically with MySQL and null performance.
It depends.
If you have a fixed-width table (no VARCHAR
, VARBINARY
, BLOB
or TEXT
), it will probably make no difference.
In a variable-width table, a NULL
will probably consume as much space as an empty VARCHAR
.
If you have nearly all values NULL
and only very few contain data, you could create a separate table which you join to.
So let's assume you have a list of people where only a few of them you have the birthdate.
So instead of
CREATE TABLE people (id INT UNSIGNED NOT NULL PRIMARY KEY AUTOINCREMENT, name VARCHAR(60) NOT NULL, birthday DATE)
you could do
CREATE TABLE people (id INT UNSIGNED NOT NULL PRIMARY KEY AUTOINCREMENT, name VARCHAR(60) NOT NULL)
CREATE TABLE birthdates (id INT UNSIGNED, birthday DATE NOT NULL)
and query the data with a LEFT JOIN.
If there are applications which need to access the table in the old format, you could define a view.