What is the best implementation of hashCode of an integer?

Question

I created a few hashCode implementation of an integer for use in a hashTable but none of them seem to at least close uniform distribution. So What would be the best implementation of hashCode of an integer assuming the size of the hashTable is near hundred and the integers are large of the order of a few thousands? Thanks in advance.

If your source integers are well distributed, you could apply `% 100` as an hashcode — dotvav, Aug 19 '15 at 09:54
Since the set of 32-bit integers is uniformly distributed in itself, I assume you have a particular subset of integers to work with, which is not. So you need to consider the constraints on your particular data and its actual distribution in order to devise a hash code that uniformly distributes that subset. You have not provided that information in your question. — RealSkeptic, Aug 19 '15 at 09:54
@AaronD That would lead to a large size of the hashTable...and even if I take hashCode%hashTableSize, that would just not work fine for numbers close to hundred which is the hashTableSize — , Aug 19 '15 at 09:56
@RealSkeptic I am implementing the hashCode for a hashTable. If the hashCodes are concentrated the hashTable would not be efficient. — , Aug 19 '15 at 09:57
Yes, but their distribution depends on your set of inputs. Also, the usual advice is to make the size of the hash table **prime**. — RealSkeptic, Aug 19 '15 at 09:59
As an experiment I took 100 random integers in the range 1000-4000 and ran them through a prime number modulus (101). The histogram of its distribution is [here](https://plot.ly/~AKremlin/8/random-number-range-1000-4000-101-distribution/). As you can see, it fills it up pretty well but you do have gaps (and up to 3 of the values have the same hash value with this simple modulus-based hash). — Aaron D, Aug 19 '15 at 10:41

score 1 · Answer 1 · edited May 23 '17 at 12:23

Since your hash table is rather small, modulo function would be the simplest implementation, and if the input numbers are random, the distribution should be random too.

public int hashCode(int x){
   return x%tableSize;
}

better implementation would be to use multiplication as described here.

(x*someNumber) % table size;

other hashing functions are described here, check them out. Hope this helps.

ssindelar · Answer 2 · 2015-08-19T11:39:16.933

If the keys of your data is uniformely distributed than just use the integer itself as a key. If your keys not uniformely distributed you need to modify the integer in such a way that its spread out more evenly through the spectrum of all Integers. How to do that depends on how your Keys are distributed and the exact Map implementation.

Are you sure you aren't doing premature optimization? In a Map with just 100 entries it really doesn't matter much if you have a constant lookup time (perfectly distributed) or linear lookup time (every entry has a key collision). Iterating 100 items is just so fast, outside of benchmarking you will most likely not notice the difference. It would be interessting to benchmark if a list would not even be faster on average than a map with such a small dataset.

Little Santi · Answer 3 · 2015-08-19T12:22:40.820

So you have thousands of values in the X axis and you want to "transform" them into a much smaller range, of hundreds, in the Y axis. Obviously you could divide by 10 or get the modulus, but you want also to distribute them as uniform as possible along the target range.

I guess you need a compression function.

You could, for example, apply a sine function to the input and multiply by the size of the hashtable. What value should have the period? It depends: The more near you expect the input values, the higher period (so that two values quite near would produce two very different results). And vice versa: If the input values are not expected to be quite near, a small period would do.

private int hashCode(int input, int tableSize)
{
    return (int)(tableSize*Math.sin(PERIOD*input));
}

leventov · Answer 4 · 2015-08-29T02:59:18.957

1

Finalization avalanche function from MurmurHash3:

int h = key;
h ^= h >>> 16;
h *= 0x85ebca6b;
h ^= h >>> 13;
h *= 0xc2b2ae35;
h ^= h >>> 16;

edited Aug 29 '15 at 02:59

answered Aug 19 '15 at 12:59

leventov

14,760
11
69
98

score 0 · Answer 5 · answered Aug 19 '15 at 10:11

0

I suggest the 'best' implementation, whatever that means, is almost certainly

Integer.valueOf(value).hashCode()

answered Aug 19 '15 at 10:11

user207421

305,947
44
307
483

1

And, according to the [Java docs](http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#hashCode()), the `hashCode()` method of `Integer` returns the primitive `int` value represented by the `Integer`. – Aaron D Aug 19 '15 at 10:49
Well... I don't think a _generic_ solution is necessarily the best. In fact, optimization consists in _particularizing_ instead of generalizing. – Little Santi Aug 19 '15 at 11:46

What is the best implementation of hashCode of an integer?

5 Answers5