22

Apparently PostgreSQL stores a couple of values in the header of each database row.

If I don't use NULL values in that table - is the null bitmap still there?
Does defining the columns with NOT NULL make any difference?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Xeoncross
  • 55,620
  • 80
  • 262
  • 364

2 Answers2

38

It's actually more complex than that.

The null bitmap needs one bit per column in the row, rounded up to full bytes. It is only there if the actual row includes at least one NULL value and is fully allocated in that case. NOT NULL constraints do not directly affect that. (Of course, if all fields of your table are NOT NULL, there can never be a null bitmap.)

The "heap tuple header" (per row) is 23 bytes long. Actual data starts at a multiple of MAXALIGN (Maximum data alignment) after that, which is typically 8 bytes on 64-bit OS (4 bytes on 32-bit OS). Run the following command from your PostgreSQL binary dir as root to get a definitive answer:

./pg_controldata /path/to/my/dbcluster

On a typical Debian-based installation of Postgres 12 that would be:

sudo /usr/lib/postgresql/12/bin/pg_controldata /var/lib/postgresql/12/main

Either way, there is one free byte between the header and the aligned start of the data, which the null bitmap can utilize. As long as your table has 8 columns or less, NULL storage is effectively absolutely free (as far as disk space is concerned).

After that, another MAXALIGN (typically 8 bytes) is allocated for the null bitmap to cover another (typically) 64 fields. Etc.

This is valid for at least versions 8.4 - 12 and most likely won't change.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • "one bit per column in the row, rounded up to full bytes" seems to disagree with "After that, another MAXALIGN(4/8 Bytes) is allocated." Is the true sense of this "rounded up to the next MAXALIGN boundary"? – Chris Betti Dec 05 '14 at 20:15
  • @ChrisBetti: It's really like I describe it. The bitmap itself is rounded up to full bytes. That's why it can fit in the 1 byte between tuple header and start of row data for <= 8 columns. But due to alignment requirements (row data must start at multiple of `MAXALIGN`), space is allocated in multiples of `MAXALIGN` bytes after that. The difference is lost to *padding*. – Erwin Brandstetter Dec 05 '14 at 20:20
  • What if there are 8 columns, and then another column is added? Will Postgres have to completely rewrite the table to add the 8 byte null bitmap after the header? – davidtgq Feb 21 '17 at 22:57
  • 2
    @DavidTan: Depends. The null bitmap is only allocated *per row* if there is at least one NULL value in the row. In the worst case, this can mean a whole-table rewrite for the 9th column. – Erwin Brandstetter Feb 21 '17 at 23:54
  • "there is one free byte between the header and the aligned start of the data" that means per all the table - am I right? if have 3 columns, with 8000 rows, and one column is ALWAYS null - wouldn't it need 8000 bits? – Dejell Mar 04 '17 at 21:05
  • 1
    @Dejell: The "free" byte is per row. For just 3 columns, the null bit won't make physical storage any bigger. Ever. Aside: if the columns is "ALWAYS" null, you might as well remove it ... – Erwin Brandstetter Mar 06 '17 at 15:00
  • ... So, as NULLs are stored as a bitmap in the tuple header, we can say that `WHERE txt IS NULL` is **faster** than `WHERE txt=''` in a query (of a big table). See correctness (?) of https://stackoverflow.com/a/32081389/287948 – Peter Krauss Jan 23 '18 at 17:06
  • 1
    @PeterKrauss: Yes, but the difference is so tiny it hardly matters - and only for sequential scans. Typically, you'd have an index for columns that need to support fast queries, which is a game changer. – Erwin Brandstetter Jan 23 '18 at 22:16
3

The null bitmap is only present if the HEAP_HASNULL bit is set in t_infomask. If it is present it begins just after the fixed header and occupies enough bytes to have one bit per data column (that is, t_natts bits altogether). In this list of bits, a 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not present, all columns are assumed not-null.

http://www.postgresql.org/docs/9.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE

so for every 8 columns you use one byte of extra storage. Then for every about million rows that would take up one megabyte of storage. Does not really seem that important. I would define the tables how they needed to be defined and not worry about null headers.

nate c
  • 8,802
  • 2
  • 27
  • 28