7

A popular answer for generating a hashing function in JS is given in Simple (non-secure) hash function for JavaScript? and Generate a Hash from string in Javascript

One example of the code is:

String.prototype.hashCode = function() {
    var hash = 0;
    if (this.length == 0) {
        return hash;
    }
    for (var i = 0; i < this.length; i++) {
        var char = this.charCodeAt(i);
        hash = ((hash<<5)-hash)+char;
        hash = hash & hash; // Convert to 32bit integer
    }
    return hash;
}

One line that doesn't make sense to me is hash = ((hash<<5)-hash)+char;

Could someone please explain WHY this is done? I gather that we are doing a 5 bit left shift on the hash. Is there any reason why it is 5 bits and not 4 or 6? Also why do we then minus the hash and add the char?

TomDane
  • 1,010
  • 10
  • 25
  • Pretty sure it doesn't matter - it's just an example of one way to transform a string of characters into a hash. You could use any algorithm you like, using whatever shifts, additions, subtractions, etc you feel like. – CertainPerformance Aug 22 '18 at 05:27

1 Answers1

8

(hash << 5) is (hash * 32), so ((hash << 5) - hash) is (hash * 31). And the reason for multiplication by 31 is described in the answers to question Why does Java's hashCode() in String use 31 as a multiplier?

So, if this is changed to (hash * 31), the result is the same. Possibly (hash << 5) - hash is slightly faster, as shift / subtraction might be faster than multiplication. However, if that's really the case depends on many factors (whether JIT compilation is used, and optimizations in the JIT, and even the processor). So I assume the author of the code tested it, and found it to be faster for his case.

Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132
  • Another answer that helped me understand is https://crypto.stackexchange.com/a/8534. In particular, that bitshifts are so widely used because they promote good diffusion. – TomDane Aug 23 '18 at 13:15
  • 1
    Yes, and multiplication by 31 is basically two bit shifts (by 5 and by 1) and subtraction. Even thought, other hash algorithms like Murmur have better diffusion / confusion / avalanche effect. – Thomas Mueller Aug 23 '18 at 13:33
  • In JavaScript, the `hash * 31` and the `hash << 5 - hash` versions give different results. When I ran a conformance test with a 250K word dictionary I got 7764 collisions with the former version, and only 9 collisions with the latter bit-shifting version. The reason for the discrepancy might be how JavaScript handles `Number` overflow and `Number.MAX_VALUE` (but maybe the 2 versions of the above algorithm run the same in Java?) – tonethar Mar 08 '22 at 21:24
  • PS - I tried out different bit-shifts to see how the results would change - `<< 3` gave 9910 collisions, `<< 4` gave 1359 collisions, `<< 6` gave 9 collisions (the same as `<< 5`) – tonethar Mar 08 '22 at 21:35
  • @tonethar you didn't specify which data you have used for your test. – Thomas Mueller Mar 10 '22 at 08:58
  • https://raw.githubusercontent.com/words/an-array-of-english-words/master/index.json – tonethar Mar 10 '22 at 14:18
  • @tonethar yes that makes sense! The words have a very similar length and are relatively similar ("difference" between words is not that huge), so it does make sense that when shifting less you would get more collisions. The hash algorithm used here is not the best really: Murmur and similar hash algorithms would result in less collisions. – Thomas Mueller Mar 11 '22 at 17:16