4

I created a few hashCode implementation of an integer for use in a hashTable but none of them seem to at least close uniform distribution. So What would be the best implementation of hashCode of an integer assuming the size of the hashTable is near hundred and the integers are large of the order of a few thousands? Thanks in advance.

  • 7
    Why not just use the integer itself as its own hash? – Aaron D Aug 19 '15 at 09:51
  • If your source integers are well distributed, you could apply `% 100` as an hashcode – dotvav Aug 19 '15 at 09:54
  • 1
    Since the set of 32-bit integers is uniformly distributed in itself, I assume you have a particular subset of integers to work with, which is not. So you need to consider the constraints on your particular data and its actual distribution in order to devise a hash code that uniformly distributes that subset. You have not provided that information in your question. – RealSkeptic Aug 19 '15 at 09:54
  • 2
    @AaronD That would lead to a large size of the hashTable...and even if I take hashCode%hashTableSize, that would just not work fine for numbers close to hundred which is the hashTableSize –  Aug 19 '15 at 09:56
  • @RealSkeptic I am implementing the hashCode for a hashTable. If the hashCodes are concentrated the hashTable would not be efficient. –  Aug 19 '15 at 09:57
  • Yes, but their distribution depends on your set of inputs. Also, the usual advice is to make the size of the hash table **prime**. – RealSkeptic Aug 19 '15 at 09:59
  • As an experiment I took 100 random integers in the range 1000-4000 and ran them through a prime number modulus (101). The histogram of its distribution is [here](https://plot.ly/~AKremlin/8/random-number-range-1000-4000-101-distribution/). As you can see, it fills it up pretty well but you do have gaps (and up to 3 of the values have the same hash value with this simple modulus-based hash). – Aaron D Aug 19 '15 at 10:41

5 Answers5

1

Since your hash table is rather small, modulo function would be the simplest implementation, and if the input numbers are random, the distribution should be random too.

public int hashCode(int x){
   return x%tableSize;
}

better implementation would be to use multiplication as described here.

(x*someNumber) % table size;

other hashing functions are described here, check them out. Hope this helps.

Community
  • 1
  • 1
Tawcharowsky
  • 615
  • 4
  • 18
1

If the keys of your data is uniformely distributed than just use the integer itself as a key. If your keys not uniformely distributed you need to modify the integer in such a way that its spread out more evenly through the spectrum of all Integers. How to do that depends on how your Keys are distributed and the exact Map implementation.

Are you sure you aren't doing premature optimization? In a Map with just 100 entries it really doesn't matter much if you have a constant lookup time (perfectly distributed) or linear lookup time (every entry has a key collision). Iterating 100 items is just so fast, outside of benchmarking you will most likely not notice the difference. It would be interessting to benchmark if a list would not even be faster on average than a map with such a small dataset.

ssindelar
  • 2,833
  • 1
  • 17
  • 36
1

So you have thousands of values in the X axis and you want to "transform" them into a much smaller range, of hundreds, in the Y axis. Obviously you could divide by 10 or get the modulus, but you want also to distribute them as uniform as possible along the target range.

I guess you need a compression function.

You could, for example, apply a sine function to the input and multiply by the size of the hashtable. What value should have the period? It depends: The more near you expect the input values, the higher period (so that two values quite near would produce two very different results). And vice versa: If the input values are not expected to be quite near, a small period would do.

private int hashCode(int input, int tableSize)
{
    return (int)(tableSize*Math.sin(PERIOD*input));
}
Little Santi
  • 8,563
  • 2
  • 18
  • 46
1

Finalization avalanche function from MurmurHash3:

int h = key;
h ^= h >>> 16;
h *= 0x85ebca6b;
h ^= h >>> 13;
h *= 0xc2b2ae35;
h ^= h >>> 16;
leventov
  • 14,760
  • 11
  • 69
  • 98
0

I suggest the 'best' implementation, whatever that means, is almost certainly

Integer.valueOf(value).hashCode()
user207421
  • 305,947
  • 44
  • 307
  • 483
  • 1
    And, according to the [Java docs](http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#hashCode()), the `hashCode()` method of `Integer` returns the primitive `int` value represented by the `Integer`. – Aaron D Aug 19 '15 at 10:49
  • Well... I don't think a _generic_ solution is necessarily the best. In fact, optimization consists in _particularizing_ instead of generalizing. – Little Santi Aug 19 '15 at 11:46