Is hashcode() a good way to compute an unique token from a list of informations?

Question

I have an

HashMap<String,AnObject>

and I'd like to give the string key a value from some infos the AnObject value contains. Suppose AnObject is made this way:

public class AnObject(){
     public String name;
     public String surname;
}

Is it correct to assign the key to:

String.valueOf(o.name.hashcode()+o.surname.hashcode());

? Or Is there a better way to compute a String hash code from a value list?

If the fields used to construct the key are always unique, you'll be OK. — NickJ, Jun 03 '14 at 10:31
as long as the fields are unique, then yes. Even a minor difference will result in a very different hash (aaaaaa vs aaaaab). — Kyte, Jun 03 '14 at 10:34
@Kyte: That's simply not true. I can find examples of different strings with the same hash code if you really want me to... — Jon Skeet, Jun 03 '14 at 10:35
I looks like the *object itself* should be the ""key"". Give it a proper hashCode+equals implementation, then your issues should be solved. — Marco13, Jun 03 '14 at 10:37
@JonSkeet Hash collisions should *rarely* occur (see http://en.wikipedia.org/wiki/Collision_attack). For what Phate is trying to do he should be fine, although he may want to use a cryptographic hash — Kyte, Jun 03 '14 at 10:39
@Kyte they are trivial to find intentionally. Just hash 2^32 strings. It should take an hour at most to hash 2^32 short strings on a modern computer, and probably much less (ie. minutes) — user253751, Jun 03 '14 at 10:39
@Kyte: "Rare" and "never" are not the same thing. A hash code is simply not an appropriate key, and your claim that different strings *will* result in different hashes is untrue. We have no indication of what this key will be used for - imagine if this is as a way of finding private information; using the hash could very easily be a very significant security hole. — Jon Skeet, Jun 03 '14 at 10:40
@Kyte, Strings "FB" and "Ea" have the same hashcode, and there are many more like them. — Zeeshan, Jun 03 '14 at 10:40

Jon Skeet · Answer 1 · 2014-06-03T10:45:36.053

No, absolutely not. hashCode() is not guaranteed to be unique.

The rules of a hash code are simple:

Two equal values must have the same hash code
Two non-equal values will ideally have different hash codes, but can have the same hash code. In particular, there are only 2³² possible values to return from hashCode(), but more than 2³² possible strings, making uniqueness impossible.
The hash code of an object should not change unless some equality-sensitive aspect of it changes. Indeed, it's generally a good idea to make types implementing value equality immutable, at least in equality-sensitive aspects. Otherwise you can easily find that you can't look up an entry using the exact same object reference that you previously used for the key!

Hash codes are an optimization technique to make it quick to find a "probably small" set of candidate values equal to some target, which you then iterate through with a rigorous equality check to find whether any of them is actually equal to the target. That's what lets you quickly look something up by key in a hash-based collection. The key isn't the hash itself.

If you need to create a key from two strings, you're going to basically have to make it from those two strings (with some sort of delimiter so you can tell the difference between {"a", "bc"} and {"ab", "c"} - understanding that the delimiter itself might appear in the values if you're not careful).

See Eric Lippert's blog post on the topic for more information; that's based on .NET rather than Java, but they all apply. It's also worth understanding that the semantics of hashCode aren't necessarily the same as those of a cryptographic hash. In particular, it's fine for the result of hashCode() to change if you start a new JVM but create an object with the same fields - no-one should be persisting the results of hashCode. That's not the case with something like SHA-256, which should be permanently stable for a particular set of data.

Examples of strings that all have a hashcode of 0: http://stackoverflow.com/questions/2310498/why-doesnt-strings-hashcode-cache-0 — assylias, Jun 03 '14 at 10:38
One additional (though very important rule): the hash value should be stable. Programmers often compute the hash out of every internal field, this can cause problems (using the objects as keys in a hash map, then altering an unimportant field and no longer finding that object in the hash map afterwards). Choose the fields used for computing the hash wisely (final fields being the best choise). — Peter Walser, Jun 03 '14 at 10:42
@PeterWalser: It should be stable within a particular context. It doesn't need to be stable between process executions etc. In particular, you shouldn't be storing the results of hashCode. Will edit something into the answer thouhg. — Jon Skeet, Jun 03 '14 at 10:43

score 1 · Answer 2 · answered Jun 03 '14 at 10:36

The hash code for String is lossy; many String values will result in the same hash code. An integer has 32 bit positions and each position has two values. There's no way to map even just the 32-character strings (for instance) (each character having lots of possibilities) into 32 bits without collisions. They just won't fit.

If you want to use arbitrary precision arithmetic (say, BigInteger), then you can just take each character as an integer and concatenate them all together.

score 0 · Answer 3 · answered Jun 03 '14 at 10:34

0

No, hashCode() (BTW pay attention on case of letter C) does not guarantee uniqueness. You can have a lot of objects that produce the same hash code.

If you need unique identifier use class java.util.UUID.

answered Jun 03 '14 at 10:34

AlexR

114,158
16
130
208

Is hashcode() a good way to compute an unique token from a list of informations?

3 Answers3