32

I have a table with 7 columns and 5 of them will be null. I will have a null columns on int, text, date, boolean, and money data types. This table will contain millions of rows with many many nulls. I am afraid that the null values will occupy space.

Also, do you know if Postgres indexes null values? I would like to prevent it from indexing nulls.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Luke101
  • 63,072
  • 85
  • 231
  • 359

3 Answers3

58

Basically, NULL values occupy 1 bit in the NULL bitmap. But it's not that simple.

The null bitmap (per row) is only allocated if at least one column in that row holds a NULL value. This can lead to a seemingly paradoxic effect in tables with 9 or more columns: assigning the first NULL value to a column can take up more space on disk than writing a value to it. Conversely, removing the last NULL value from the row also removes the NULL bitmap.

Physically, the initial null bitmap occupies 1 byte between the HeapTupleHeader (23 bytes) and actual column data or the row OID (if you should still be using that) - which always start at a multiple of MAXALIGN (typically 8 bytes). This leaves 1 byte of padding that is utilized by the initial null bitmap.

In effect, NULL storage is absolutely free for tables of 8 columns or less (including dropped, but not yet purged columns).
After that, another MAXALIGN bytes (typically 8) are allocated for the next MAXALIGN * 8 columns (typically 64). Etc.

More details in the manual and under these related questions:

Once you understand alignment padding of data types, you can further optimize storage:

But the cases are rare where you can save substantial amounts of space. Normally it's not worth the effort.

@Daniel already covers effects on index size.

Note that dropped columns (though now invisible) are kept in the system catalogs until the table is recreated. Those zombis can force the allocation of an (enlarged) NULL bitmap. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • "NULL storage is absolutely free for tables of 8 columns or less." - what if only 1 column holds null? this means that the null bitmap with 1 byte will be created just for holding 1 bit? – Dejell Mar 04 '17 at 20:56
  • The null bitmap is there if there are any null values, with enough bytes to cover all columns, plus padding to the next multiple of `MAXALIGN`. – Erwin Brandstetter Mar 05 '17 at 04:42
  • Thanks Erwin. Still not sure - what if I don't have any null values? would it still be there? (many to many table) – Dejell Mar 05 '17 at 05:52
  • 1
    @Dejell: No null value in the row, then no null bitmap either. That's in the answer above. Follow the links for more details. – Erwin Brandstetter Mar 06 '17 at 14:58
  • 1
    Why does the documentation say that the header takes up 23 bytes (on most machines), but the sum of the lengths in the table there equals 27? – user1738984 May 28 '19 at 14:59
  • 1
    Yes, confusing. I suppose `t_cid ... (overlays with t_xvac)` [there](https://www.postgresql.org/docs/current/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE) means that those two items share the same physical field. Comments in the source code also hint as much: https://doxygen.postgresql.org/htup__details_8h_source.html line 72. – Erwin Brandstetter May 28 '19 at 23:22
15

Whether NULL values get to the index or not depend at least on the type of the index. Basically, this would be YES for btree and gist index types, NO for hash, and it seems YES or NO for gin index types depending on PostgreSQL version.

There used to be a boolean column amindexnulls in the pg_catalog.pg_am table that carried that information, but it's gone in 9.1. Probably because indexes have got even more sophisticated among PG improvements.

In the specific case of your data, the best way to know would be to measure the size difference of indexes, using the pg_relation_size('index_name') function, between contents entirely NULL and entirely NOT NULL, with your exact PG version, exact datatype, exact index type and definition. And know that possibly, a future change in any of these parameters may change the outcome.

But in any case, if you "just" want to avoid indexing NULLs, it's always possible to create a partial index:

CREATE INDEX partial_idx(col) ON table WHERE (col is not null)

This is going to take less space, but whether this is going to help or not with the performance of queries depend on these queries.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • 8
    +1 insightful. With partial indexes it is important to note that they can only be utilized by the query planner if it can verify that the condition is covered in the query. The query planner is smart, but fails to resolve complex logic in this respect. Match the `WHERE` clause more or less verbatim to make sure the index can be used (possible `AND`-ed with more conditions). [Details in the manual](http://www.postgresql.org/docs/current/interactive/indexes-partial.html). – Erwin Brandstetter Aug 27 '12 at 18:22
  • Cool..Thanks for your insightful answer – Luke101 Aug 27 '12 at 19:45
2

I believe each would use a single bit in the bitmap for the row. See here: http://www.postgresql.org/docs/9.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE

Cody
  • 2,467
  • 2
  • 21
  • 30