3

This Postgres database got tables of all sizes up to 10 million rows. Almost all have a BIGINT primary key counting from 1 and up.

Since BIGINT is 64 bits, but 10 million rows are well within the 2 billion maximum of INT, would it be worthwhile converting those BIGINT columns into INT (32 bits) and SMALLINT (16 bits) to speed up some heavy duty SQLs? Storing indexes/tables more compactly should give us a higher cache hit ratio. How much of a speedup could I expect if any? And are there any disadvantages by not using BIGINT? (assuming reaching max of INT/SMALLINT will never be an issue)

Kjetil S.
  • 3,468
  • 20
  • 22
  • "assuming reaching max of INT/SMALLINT will never be an issue" ... that's a big assumption. There is always someone that comes with the bright idea of adding "x millions/billions/whatever" to the PK to store temp records/v2 records/whatever, and since it worked so well the 1st time, let's do this routinely. – JGH Apr 07 '20 at 14:06

3 Answers3

5

It very much depends on actual table and index definitions. The switch saves 4 bytes for the column - but since all storage is done in multiples of 8 bytes, this may be swallowed by alignment padding or free up 8 bytes if you are lucky.

The standard btree index backing the PK won't change in size, 4 bytes are lost to alignment padding. But if you have use for an additional 4-byte column in a covering index, that saves 8 bytes instead of just 4, which makes the index tuple 20 bytes total instead of 28.

The bigint primary key is only the start. If there are foreign key references to it, effects are multiplied. Or you have multicolumn indexes involving multiple FK columns. Then the switch can very well result in a good speedup like you are looking for. Especially if cache memory is limited. It all depends.

If you are certain you won't burn more than 3^31 numbers (not 2^32: Postgres uses signed integer, you'd have to use the negative half too) over the lifetime of the table, and you actually save space in tables and indexes, then, by all means, switch to plain integer. I have seen an actual difference many times. But you need some understanding of Postgres storage mechanisms before you tinker with this.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
3

Only a load test will give a reliable answer to that, but I expect that you won't see a noticeable performance gain.

bigint is the correct data type here. Even if you are certain that you won't exhaust the limit of integer, consider this:

  • if you generate values using a sequence, you are likely not to use every possible number – transactions could be rolled back

  • there once were a couple of smart people who thought that 232 would be way more IP addresses than anyone could ever possibly need

With a small table like yours, the savings in space and performance would not be worth the effort. With a big table, you don't want to suffer the pain of having to convert it from integer to bigint.

Premature optimization is the root of all evil.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
0

Data is generally stored on disk aligned to 8 byte boundaries. A single column index is going to be the same size on bigint as it will be on int. For a table or a multicolumn index, it might be able to pack int more tightly, depending what adjacent columns can be merged together to fit in 8 bytes.

The performance change is likely to be small or non-existent, and hard to predict. To know the answer, you would have to do the test. But it probably is not worth the time it would take to do it.

jjanes
  • 37,812
  • 5
  • 27
  • 34