Hash Functions are incredibly useful and versatile. In general, they are used to map a space to one much smaller space. Of course that means that two objects may hash to the same value (collision), but this is because you are reducing the space (pigeonhole principle). The efficiency of the function largely depends on the size of the hash space.
It comes as a surprise then that a lot of Java hashCode functions are using multiplication to produce the hash code of a new object as e.g. follows (creating-a-hashcode-method-java)
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + (int) (id ^ (id >>> 32));
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
If we want to mix two hashcodes in the same range, xor should be much better than addition and is I think traditionally used. If we wanted to increase the space, shifting by some bytes and then xoring would still imho make sense. I guess multiplying by 31 is almost the same as shifting one hash by 1 and then adding but it should be much less efficient...
As it is the recommended approach though, I think I am missing something. So my question is why would this be?
Notes:
- I am not asking why we use a prime. It is pretty clear that if we used multiplication, we should go with a prime. However multiplying by any number, even a prime, should still be suboptimal to xor. That is why e.g. all these other non-cryptographic hash functions - as well as most cryptographic - use xor and not multiplications...
- I have indeed no indication (apart from all those well known hash functions) xor would be better. In fact just by the fact it is so widely accepted, I suspect it should be as good and in practice better to multiply by a prime and sum. I am asking why this is...
- The
int
type in Java can be used to represent any whole number from -2147483648 to 2147483647. - Sometimes the hashcode of an object may be its memory address (which makes sense and is efficient in a lot of situations) (if inherited from e.g. object)