9

Which index will perform better for foreign keys of type integer in postgresql 9.3?

I would assume a hash index, because foreign key comparisson are always made with =

Or does a btree compare as fast as a hash when used for JOINS on foreign keys?

Because in postgresql primary keys use btree's that would suggest they are also better for foreign keys.

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
frasen
  • 169
  • 2
  • 6

2 Answers2

9

From the manual On PostgreSQL 9.3:

Caution Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.

There is also no proof that an hash index has any performance benefits over a btree.

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
  • 1
    Very strange. You'd think a hash index would be much faster, but numerous benchmarks have shown that it's only slightly faster at best. Hash is much faster than tree-based for typical in-memory data structures (e.g. Java's `HashMap` vs `TreeMap`). Any reason it's slow in Postgres, or is it just because Postgres's hash index implementation is not well-maintained? – sudo Jun 16 '16 at 00:32
  • 5
    This is no longer relevant in postgresql 10+ https://www.postgresql.org/docs/10/sql-createindex.html – Kevin Parker Nov 21 '19 at 16:49
  • @sudo big trees have much better data locality, while hash tables are randomly distributed. The cache performance makes a big difference especially when reading from disk. – OrangeDog Jun 10 '20 at 15:35
  • IIRC even in Postgres 10+ switching my indexes to hash didn't show any improvement. I forget what the situation was. Data locality... I don't understand why that'd help during equality checks unless one side is sorted. – sudo Jun 14 '20 at 02:19
  • 1
    @sudo if one side is a btree index scan, then yes it will be sorted – OrangeDog Apr 22 '22 at 14:30
3

It depends.

If you're not doing any queries using the foreign key, then you don't need any index. The referential integrity is enforced using the index of the referenced primary key.

The question of which index type to use (if any) is thus the same for any column(s). If your queries would benefit from an index for a = comparison, and you're using PostgreSQL 10 or newer, then a HASH index is a reasonable choice. If the same column is involved in any ordering operations (ORDER BY, <, >=, etc.) then you may as well use a BTREE.

If you are concerned about the relative performance, then you'll need to test them yourself, using your own data distribution and query load. A tree index may still perform better than a hash, due to data access locality (sequential vs. random).

OrangeDog
  • 36,653
  • 12
  • 122
  • 207