7

I need a quick hash function for integers:

 int hash(int n) { return ...; }

Is there something that exists already in Java?

The minimal properties that I need are:

  • hash(n) & 1 does not appear periodic when used with a bunch of consecutive values of n.
  • hash(n) & 1 is approximately equally likely to be 0 or 1.
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • curious where you want to use this – Miserable Variable Mar 08 '12 at 21:31
  • @Miserable: I just need a bunch of fake values that are functions of table indices. (and the nonperiodicity is important, for reasons that aren't so quick to explain) – Jason S Mar 08 '12 at 21:33
  • perhaps the number of bits that are 1 in the number? – Miserable Variable Mar 08 '12 at 21:34
  • but that is not cheap to compute, I am reminded of an optimization attributed to Knuth.. – Miserable Variable Mar 08 '12 at 21:39
  • Using my comment above, [Best algorithm to count the number of set bits in a 32-bit integer][1] seems like a good solution. [1]: http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer – Miserable Variable Mar 08 '12 at 21:47
  • 1
    Use your integer to seed an RNG and then take the first (or the same Nth every time) random number. – Hot Licks Mar 09 '12 at 02:26
  • @Hot Licks: that's probably what I'll end up doing; I was hoping there would be something simpler and less opaque. – Jason S Mar 09 '12 at 02:55
  • @HotLicks's solution is probably the only one that satisfies all of your constraints. – Louis Wasserman Mar 09 '12 at 07:05
  • @MiserableVariable -- Actually, computing the number of 1 bits is fairly cheap, using the method of US patent 6,516,330 – "Counting set bits in data words". – Hot Licks Mar 09 '12 at 12:04
  • @HotLicks Even if counting bits is inexpensive, the range will only be 0-32/64 and it will be highly repetitive for consecutive values. Not to mention the obvious problems with suggesting an algorithm by referencing its patent no. :/ – Tim Mar 09 '12 at 17:06
  • True -- counting bits is probably a lousy hash, in most circumstances. (In fact, the need to count bits is pretty rare in computerdom, generally speaking. I can't recall exactly why there was a need for the version in the above patent, but it was never used, IIRC.) – Hot Licks Mar 09 '12 at 17:47
  • 1
    Why not just use SHA? – CromTheDestroyer Jan 30 '15 at 18:48
  • Um, SHA isn't quick, at least relatively speaking. Plus it works on an array of bytes, not integers. I honestly don't remember what I was using this for (it was almost 3 years ago), but I found an answer that worked for me. – Jason S Jan 30 '15 at 18:54

5 Answers5

4

HashMap, as well as Guava's hash-based utilities, use the following method on hashCode() results to improve bit distributions and defend against weaker hash functions:

  /*
   * This method was written by Doug Lea with assistance from members of JCP
   * JSR-166 Expert Group and released to the public domain, as explained at
   * http://creativecommons.org/licenses/publicdomain
   * 
   * As of 2010/06/11, this method is identical to the (package private) hash
   * method in OpenJDK 7's java.util.HashMap class.
   */
  static int smear(int hashCode) {
    hashCode ^= (hashCode >>> 20) ^ (hashCode >>> 12);
    return hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4);
  }
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
3

So, I read this question, thought hmm this is a pretty math-y question, it's probably out of my league. Then, I ended up spending so much time thinking about it that I actually believe I've got the answer: No function can satisfy the criteria that f(n) & 1 is non-periodic for consecutive values of n.

Hopefully someone will tell me how ridiculous my reasoning is, but until then I believe it's correct.

Here goes: Any binary integer n can be represented as either 1...0 or 1...1, and only the least significant bit of that bitmap will affect the result of n & 1. Further, the next consecutive integer n + 1 will always contain the opposite least significant bit. So, clearly any series of consecutive integers will exhibit a period of 2 when passed to the function n & 1. So then, is there any function f(n) that will sufficiently distribute the series of consecutive integers such that periodicity is eliminated?

Any function f(n) = n + c fails, as c must end in either 0 or 1, so the LSB will either flip or stay the same depending on the constant chosen.

The above also eliminates subtraction for all trivial cases, but I have not taken the time to analyze the carry behavior yet, so there may be a crack here.

Any function f(n) = c*n fails, as the LSB will always be 0 if c ends in 0 and always be equal to the LSB of n if c ends in 1.

Any function f(n) = n^c fails, by similar reasoning. A power function would always have the same LSB as n.

Any function f(n) = c^n fails, for the same reason.

Division and modulus were a bit less intuitive to me, but basically, the LSB of either option ends up being determined by a subtraction (already ruled out). The modulus will also obviously have a period equal to the divisor.

Unfortunately, I don't have the rigor necessary to prove this, but I believe any combination of the above operations will ultimately fail as well. This leads me to believe that we can rule out any transcendental function, because these are implemented with polynomials (Taylor series? not a terminology guy).

Finally, I held out hope on the train ride home that counting the bits would work; however, this is actually a periodic function as well. The way I thought about it was, imagine taking the sum of the digits of any decimal number. That sum obviously would run from 0 through 9, then drop to 1, run from 1 to 10, then drop to 2... It has a period, the range just keeps shifting higher the higher we count. We can actually do the same thing for the sum of the binary digits, in which case we get something like: 0,1,1,2,2,....5,5,6,6,7,7,8,8....

Did I leave anything out?

TL;DR I don't think your question has an answer.

Tim
  • 1,011
  • 6
  • 12
  • Galois fields work well (e.g. LFSRs) but are nontrivial to compute, except for shift registers where you essentially calculate f(n) from f(n-1). I'm not looking for *true* nonperiodicity, just something that has a relatively long period (e.g. 2^12 or even 2^8 would be fine in my case). – Jason S Mar 09 '12 at 02:13
  • @JasonS: If 2^8 would be fine, then you can create a `new bool[256]`, then use a random-number generate to randomly select half of its entries and set those to `true`. (Note: this is *slightly* trickier than it sounds, due to collisions, but still quite manageable.) Then you can use it as a look-up table on `i & 255`. – ruakh Mar 09 '12 at 14:23
  • @JasonS I thought that might be the case. I haven't heard of Galois fields before, but they do look interesting. My best guess for a way to mimic non-periodicity was to use very large multiples of several trigonometric functions, thus skewing consecutive results very far apart. Obviously, this would also be relatively slow. Good luck! Please post if you do find an answer. – Tim Mar 09 '12 at 17:15
1

[SO decided to convert my "trivial answer" to comment. Trying to add little text to it to see if it can be fooled]

Unless you need the ranger of hashing function to be wider..

The NumberOfSetBits function seems to vary quite a lot more then the hashCode, and as such seems more appropriate for your needs. Turns out there is already a fairly efficient algorithm on SO.

See Best algorithm to count the number of set bits in a 32-bit integer.

Community
  • 1
  • 1
Miserable Variable
  • 28,432
  • 15
  • 72
  • 133
  • StackExchange converts answers to comments? That is *not* cool. >:( – Jason S Mar 09 '12 at 13:27
  • 1
    My first answer was exactly what you see in the comment: "Using my comment above, seems like a good solution". I guess it is to prevent karma hogging. My bigger complaint is that it did not convert the link properly while converting to comment :) – Miserable Variable Mar 09 '12 at 15:44
1

I did some experimentation (see test program below); computation of 2^n in Galois fields, and floor(A*sin(n)) both did very well to produce a sequence of "random" bits. I tried multiplicative congruential random number generators and some algebra and CRC (which is analogous of k*n in Galois fields), none of which did well.

The floor(A*sin(n)) approach is the simplest and quickest; the 2^n calculation in GF32 takes approx 64 multiplies and 1024 XORs worstcase, but the periodicity of output bits is extremely well-understood in the context of linear-feedback shift registers.

package com.example.math;

public class QuickHash {
    interface Hasher
    {
        public int hash(int n); 
    }
    static class MultiplicativeHasher1 implements Hasher
    {
        /* multiplicative random number generator
         * from L'Ecuyer is x[n+1] = 1223106847 x[n] mod (2^32-5)
         * http://dimsboiv.uqac.ca/Cours/C2012/8INF802_Hiv12/ref/paper/RNG/TableLecuyer.pdf
         */
        final static long a = 1223106847L;
        final static long m = (1L << 32)-5;
        /*
         * iterative step towards computing mod m
         *   (j*(2^32)+k) mod (2^32-5)
         * = (j*(2^32-5)+j*5+k) mod (2^32-5)
         * = (j*5+k) mod (2^32-5)
         * repeat twice to get a number between 0 and 2^31+24 
         */
        private long quickmod(long x)
        {
            long j = x >>> 32;
            long k = x & 0xffffffffL;
            return j*5+k;
        }
        // treat n as unsigned before computation
        @Override public int hash(int n) {
            long h = a*(n&0xffffffffL);
            long h2 = quickmod(quickmod(h));            
            return (int) (h2 >= m ? (h2-m) : h2);
        }       
        @Override public String toString() { return getClass().getSimpleName(); } 
    }

    /** 
     * computes (2^n) mod P where P is the polynomial in GF2
     * with coefficients 2^(k+1) represented by the bits k=31:0 in "poly";
     * coefficient 2^0 is always 1
     */
    static class GF32Hasher implements Hasher
    {
        static final public GF32Hasher CRC32 = new GF32Hasher(0x82608EDB, 32);

        final private int poly;
        final private int ofs;
        public GF32Hasher(int poly, int ofs) {
            this.ofs = ofs;
            this.poly = poly;
        }

        static private long uint(int x) { return x&0xffffffffL; }
        // modulo GF2 via repeated subtraction
        int mod(long n) {
            long rem = n;
            long q = uint(this.poly);
            q = (q << 32) | (1L << 31);
            long bitmask = 1L << 63;
            for (int i = 0; i < 32; ++i, bitmask >>>= 1, q >>>= 1)
            {
                if ((rem & bitmask) != 0)
                    rem ^= q;
            }
            return (int) rem;
        }
        int mul(int x, int y)
        {
            return mod(uint(x)*uint(y));
        }
        int pow2(int n) {
            // compute 2^n mod P using repeated squaring
            int y = 1;
            int x = 2;
            while (n > 0)
            {
                if ((n&1) != 0)
                    y = mul(y,x);
                x = mul(x,x);
                n = n >>> 1;
            }
            return y;
        }
        @Override public int hash(int n) {
            return pow2(n+this.ofs);
        }
        @Override public String toString() { 
            return String.format("GF32[%08x, ofs=%d]", this.poly, this.ofs); 
        }
    }
    static class QuickHasher implements Hasher
    {
        @Override public int hash(int n) {
            return (int) ((131111L*n)^n^(1973*n)%7919);
        }
        @Override public String toString() { return getClass().getSimpleName(); }       
    }

    // adapted from http://www.w3.org/TR/PNG-CRCAppendix.html
    static class CRC32TableHasher implements Hasher
    {
        final private int table[];
        static final private int polyval = 0xedb88320;

        public CRC32TableHasher() 
        {           
            this.table = make_table();
        }

        /* Make the table for a fast CRC. */
        static public int[] make_table()
        {
            int[] table = new int[256]; 
            int c;
            int n, k;

            for (n = 0; n < 256; n++) {
                c = n;
                for (k = 0; k < 8; k++) {
                    if ((c & 1) != 0)
                        c = polyval ^ (c >>> 1);
                    else
                        c = c >>> 1;
                }
                table[n] = (int) c;
            }
            return table;
        }

        public int iterate(int state, int i)
        {
            return this.table[(state ^ i) & 0xff] ^ (state >>> 8);
        }
        @Override public int hash(int n) {
            int h = -1;
            h = iterate(h, n >>> 24); 
            h = iterate(h, n >>> 16); 
            h = iterate(h, n >>> 8); 
            h = iterate(h, n); 
            return h ^ -1;
        }
        @Override public String toString() { return getClass().getSimpleName(); } 
    }

    static class TrigHasher implements Hasher
    {       
        @Override public String toString() { return getClass().getSimpleName(); }
        @Override public int hash(int n) { 
            double s = Math.sin(n);
            return (int) Math.floor((1<<31)*s);
        }       
    }

    private static void test(Hasher hasher) {
        System.out.println(hasher+":");
        for (int i = 0; i < 64; ++i)
        {
            int h = hasher.hash(i);
            System.out.println(String.format("%08x -> %08x   %%2 = %d", 
                    i,h,(h&1)));
        }
        for (int i = 0; i < 256; ++i)
        {
            System.out.print(hasher.hash(i) & 1);
        }
        System.out.println();       
        analyzeBits(hasher);
    }

    private static void analyzeBits(Hasher hasher) {
        final int N = 65536;
        final int maxrunlength=32;
        int[][] runs = {new int[maxrunlength], new int[maxrunlength]};
        int[] count = new int[2];
        int prev = -1;
        System.out.println("Run length test of "+N+" bits");
        for (int i = 0; i < maxrunlength; ++i)
        {
            runs[0][i] = 0;
            runs[1][i] = 0;
        }
        int runlength_minus1 = 0;
        for (int i = 0; i < N; ++i)
        {
            int b = hasher.hash(i) & 0x1;
            count[b]++;
            if (b == prev)
                ++runlength_minus1;
            else if (i > 0)
            {
                ++runs[prev][runlength_minus1];
                runlength_minus1 = 0;
            }
            prev = b;
        }
        ++runs[prev][runlength_minus1];

        System.out.println(String.format("%d zeros, %d ones", count[0], count[1]));
        for (int i = 0; i < maxrunlength; ++i)
        {
            System.out.println(String.format("%d runs of %d zeros, %d runs of %d ones", runs[0][i], i+1, runs[1][i], i+1));         
        }
    }

    public static void main(String[] args) {
        Hasher[] hashers = {
            new MultiplicativeHasher1(), 
            GF32Hasher.CRC32, 
            new QuickHasher(),
            new CRC32TableHasher(),
            new TrigHasher()
        };
        for (Hasher hasher : hashers)
        {
            test(hasher);
        }
    }
}
Jason S
  • 184,598
  • 164
  • 608
  • 970
-1

The simplest hash for int value is the int value.

See Java Integer class

public int hashCode()  
public static int hashCode(int value) 

Returns:

a hash code value for this object, equal to the primitive int value represented by this Integer object.

Paul Verest
  • 60,022
  • 51
  • 208
  • 332