2

Background:

I have three columns with the following sizes in bytes (for a total of 8): 4 (int), 2 (small int), 2 (small int).

I'm creating a multicolumn (aka composite) index on these three columns (in the order specified above). I will be doing two types of select queries:

  1. Range look up based on the first 4-byte column. This first column will be monotonically increasing (timestamp).
  2. Key-ed look up where all three of these values are specified.

Question: Is there any benefit in Postgres to combining these three columns into one 8-byte bigint and dealing with the separation in the application layer?

I'm asking for a perspective on DB query and storage efficiency.

Amit
  • 30,756
  • 6
  • 57
  • 88
BrainCore
  • 5,214
  • 4
  • 33
  • 38
  • For case #2, you will notice an increase in performance in reads, however writes will take a bit longer as it needs to update for 3 columns instead of one. The index itself is really just a hashset which will take up around 3 times the size of the existing index for 1 column. Multicolumn indexes do not increase performance on single column searches. You would need a single column index for #1, and multicolumn index for #2. Keep in mind, the more indexes you have, the slower the writes will be. – Kraang Prime Jan 01 '17 at 08:53
  • @KraangPrime: Multi-column indexes can absolutely speed up single column conditions - especially if they condition works with the first column. But even with trailing columns in certain situations –  Jan 01 '17 at 09:36
  • @a_horse_with_no_name - Sorry, I should clarify. It has no impact if it isn't a primary index. If you index (A, B, C), and query on only one of (B) or (C), then the index is not used at all for the lookup. For lookups to use an index, all parts or at least the primary part must be used to have any impact. See [this explanation](http://stackoverflow.com/a/2349824/3504007) – Kraang Prime Jan 01 '17 at 09:45
  • @KraangPrime: Postgres is still able to use the index for a condition on e.g. column B or C - it's just not as efficient though. See e.g. here: http://thebuild.com/blog/2016/12/30/the-multi-column-index-of-the-mysteries/ and it has nothing to do whether it's a primary key or not. If you create an index on non-PK columns and have a condition that substantially reduces the number of rows, there is no need to include any of the PK columns in the condition. –  Jan 01 '17 at 09:58
  • @a_horse_with_no_name - by primary *index*, i meant the first column in the multi-column index -- not `primary key`. That is why I worded it that way :) – Kraang Prime Jan 01 '17 at 10:02

1 Answers1

1

I suspect that storage-wise any gains from combining will be tiny and will be offset by the limitations of so doing. Yes, you can combine but you cannot have any referential integrity checks against subparts of a field. I.e. tuple A can relate to tuple B, but A and B have to be subsets of the whole fields of the table. This is the basis of the atomicity requirement of 1NF.

Now, you can query inside the field using functions to extract the information you need, and you can even index the output of those functions if you know what you are doing. But this then uses more space than you might otherwise, and you still lose the referential integrity enforcement possibilities.

In general, space is a concern, but not at this level of optimization. Unless you have extremely specialized needs, combining the values poses more problems than it solves.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182