0

Possible Duplicate:
why do String.hashCode() in java is not implemented in a way with less conflicts?

For non-cryptographic hashes, how does Java's String.hashCode() perform?

Mostly I'm concerned about collisions.

Thanks.

Community
  • 1
  • 1
Kevin Meredith
  • 41,036
  • 63
  • 209
  • 384
  • I've rephrased question since Scala [uses Java strings](http://stackoverflow.com/a/6560090/298389). – om-nom-nom Jan 05 '13 at 01:40
  • This is probably impossible to answer without you telling us what strings you are interested in. Over 10-char strings, all reasonable 32-bit hashes have the same number of collisions, namely 256^10-256^4. – Pascal Cuoq Jan 05 '13 at 01:43
  • As expected, quite well. You can read the source code. Collisions are rare, even when normal ASCII is more 6 bit per byte, and hashCode uses `*31`. – Joop Eggen Jan 05 '13 at 01:44

2 Answers2

4

You seem to be misunderstanding what .hashCode() is for with regards to Java, and more specifically, the .equals()/.hashCode() contract specified by java.lang.Object.

The only part of the contract of matter to anyone is that if two objects are equal with regards to .equals(), then they must have the same hash code as returned by .hashCode(). There is no other obligation to that contract.

It is therefore perfectly legal to write a custom .hashCode() implementation like this, even though this is as suboptimal as one can think of:

@Override
public int hashCode()
{
    // Legal, but useless
    return 42;
}

Of course, JDK developers would never be that thick, and .hashCode() implementations for builtin types (including String) are good enough that you do not even need to worry about collisions. Even then, this implementation will more than likely vary from one JDK implementation to another, and so will its "cryptographic value".

But that's not the point.

The most important thing to consider is that .hashCode() has nothing to do with cryptography at all. Its only obligation is to obey the contract defined by java.lang.Object.

fge
  • 119,121
  • 33
  • 254
  • 329
0

It's pretty good as a general purpose hash function. i.e. you shouldn't usually worry about it.

In particular:

  • It is fast, to the extent that it probably produces hashes as the CPU can read the String from memory (i.e. you usually can't get better without skipping large parts of the String). It does just one multiply and one add per character in the String.
  • For typical sets of random Strings, it produces well-distributed hashes over the entire int range.

Obviously, it is not a cryptographic hash function, so don't use it for that. Also, be aware that you likely will get hash collisions as it is producing a 32-bit hash. So you just need to design your algorithms to take that into account.

mikera
  • 105,238
  • 25
  • 256
  • 415
  • Don't think I'm going to bully you, but your posts says ... uhm, nothing: *you likely get collisions*, *you shouldn't worry about* and so on. Instead of saying those general things, can you, please, say where is the point, when I need to start worrying about? How likely (exactly) hash collisions occurs? – om-nom-nom Jan 05 '13 at 01:55
  • @om-nom-nom: if you want to know the exact likelihood of collisions, then you need to describe the distribution of Strings. You also need to describe your use case. These are different questions from what was asked. I can tell you that Java's hashCode is optimal, in the sense that it produces the minimal number of collisions possible for a 32-bit hashCode given totally random Strings. – mikera Jan 05 '13 at 02:00
  • @mikera you're right, even Integer.MIN_VALUE https://stackoverflow.com/questions/74435861/a-java-string-with-hashcode-equal-to-integer-min-value – caduceus Feb 11 '23 at 09:19