6

There are two basic methods for implementing a hash function that are cited in pretty much every text book and CS courses:

  1. Division method where we simply do k mod m essentially picking m as prime not too close to power of 2.
  2. Multiplication method where we multiply k with some well-chosen irrational number (Knuth suggests using number based on Golden ratio) between 0 to 1, take the fractional part of the product and use desired number of most significant bits from it.

Most text books and courses cite several disadvantages for method 1 including the fact that it's expensive and things depends on m. However I've never yet seen any textbook or courses mention single disadvantage for method 2.

This makes method 2 more desirable. Plus the method 2 can be made very efficient on modern computers eliminating floating point arithmetic all together. So it would look like that method 2 is hands down winner and no body should be talking about method 1. But that's obviously not the case. In fact, I've never seen method 2 getting used in any practical implementations. So it does have some disadvantages.

Question is what are they and why method 1 gets used more often despite of its disadvantages?

Shital Shah
  • 63,284
  • 17
  • 238
  • 185

1 Answers1

4

Division method is used in conjunction with hash table algorithms which require prime table size - for example, open addressing with double hashing or QHash, when you anyway need to divide the key, or it's hash, by table size to get the index.

Multiplication method is suitable when the table size is a power of two, then getting the index from the hash could be implemented as bitwise AND operation, therefore the whole path of computing the table index by the key, with multiplication hashing, is very fast. You can explore some actual implementations by searching for magic constant 2654435769 on Github.

There is a recent trend of using MurmurHash3 avalanche procedure instead of multiplication method:

int hash = key;
hash ^= (hash >> 16);
hash *= 0x85ebca6b;
hash ^= (hash >> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >> 16);
// see this code and the version for 64 bits here:
// https://smhasher.googlecode.com/svn/trunk/MurmurHash3.cpp

Because it is just a bit slower, but considered more robust to bad key distribution. That is why you might get wrong (or right?) impression that multiplication method is used unfairly rarely.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Thanks. While this answer is insightful it really doesn't answer the core question: Multiplication method seems to be better in all respect to division method including the fact you no longer are restricted to choosing prime m. Yes, murmur hash should be more expensive but basic multiplication method isn't. Why anyone should be using division method at all? – Shital Shah Aug 10 '14 at 06:41
  • 1
    I think I answered this question in the first paragraph. Some algorithms *require* table size to be prime. Even if you using any other hash function, you *have to* perform an expensive integral division after that, to obtain the index in the table. Given this it is simplier to divide the (int) key directly, without pre-hashing, because this division results to acceptably good distribution itself. – leventov Aug 10 '14 at 10:56