48

The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.

Defining it e.g. as ordinal() ^ getClass().getName().hashCode() would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?

Summary of the answers

Using Object.hashCode() compares to a nicer hashCode like the one above as follows:

  • PROS
    • simplicity
  • CONTRAS
    • speed
    • more collisions (for any size of a HashMap)
    • non-determinism, which propagates to other objects making them unusable for
      • deterministic simulations
      • ETag computation
      • hunting down bugs depending e.g. on a HashSet iteration order

I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.

UPDATE

I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.

The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.

UPDATE 2

There are some strange things going on with the hashCode performance in general. When I understand them, there's still the open question, why System.identityHashCode (reading from the object header) is way slower than accessing a normal object field.

pbh101
  • 10,203
  • 9
  • 32
  • 31
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • 1
    I don't see a problem with the default `hashCode()` implementation. And you should not need it to be deterministic across JVMs. – Bozho Feb 03 '11 at 10:49
  • 1
    Well, they *have* chosen to be so for String (and the primitive wrapper types). – aioobe Feb 03 '11 at 10:52
  • Is there anything wrong in using `Collections.synchronizedMap(new EnumMap(...));` (as recommended on http://download.oracle.com/javase/6/docs/api/java/util/EnumMap.html)? How would you go around implementing a `ConcurrentEnumMap`? – Jean Hominal Feb 03 '11 at 10:58
  • The `ConcurrentEnumMap` was just an example. Take another one, there's no ImmunatableEnumMap in Guava. And no, Collections.synchronizedMap is not always as good as ConcurrentMap. – maaartinus Feb 03 '11 at 11:04
  • 3
    I think the author's point is that you could find yourself comparing two instances of the same Enum value on different virtual machines, where each one would have a different memory address and thus a different hashCode(). Some answers seem to say that this can't happen, but have those people tried it? With all the features of Java EE and deployments on multiple hosts, can you prove that this won't cause a problem? – GlenPeterson Jan 17 '13 at 20:31
  • 1
    One additional pro for Enum hashCode and ordinal to be the same - is use of Enum hashCode in ETag (HTTP Response Header) calculation for distributed environment.(ETag needs to be consistent across all the machines in the distribution, otherwise ETag effectively looses it's caching functionality) – mavarazy Aug 28 '13 at 10:48
  • Bozho comment has it right - hashCode has no obligation to return the same value outside the JVM process. The default hashCode offers excellent entropy and costs significantly less than any other solution and perfectly complies with equality by reference. On CHM: keep in mind that both hashmap and CHM do extra internal hashing that takes into consideration more bits than just the lower ones. – bestsss Jun 22 '14 at 02:27
  • @mavarazy, you should never use hashCode to compute persistent hashCodes based on the content. If you need anything like fast, stable hash with good entropy look at murmur3. (Security hashes like SHA/MD5 and so on can be used but they are slower) – bestsss Jun 22 '14 at 02:29
  • @bestsss Yes, a normal `hashCode` is a mess, weak, conflicting, slow, and non-persistent. But with a large object graph, it's a PITA to define another universally usable method, especially when 3rd party objects are mixed in. – maaartinus Jun 22 '14 at 11:32
  • I'm getting burnt by this at the moment as Kotlin has the same implementation. (Maybe it just creates Java enums.) Trying to create a deterministic simulation across invocations. It works when I run the same code every time, but after multi-threading it to speed it up I get non-deterministic results. I suspect it's map iteration order that's burning me, but haven't cracked the cause yet. – Graham Lea Jun 09 '18 at 11:45
  • @GrahamLea I'd suspect first the multithreading itself as any race can give you problems. Im my experiment, Java enum `hashCode` was stable across runs when the same code was executed, which cost me even more time as I suspected something else. So Sun made it non-deterministic, but not properly non-deterministic. That's sad. – maaartinus Jun 09 '18 at 14:43
  • I found the cause of my problems. As suspected, the iteration order of HashSets which use Enums as the keys are non-deterministic, though they may appear deterministic across multiple single-threaded runs. The __solution__ to this is to use LinkedHashSet instead, which has a deterministic iteration order unrelated to the hashCode. – Graham Lea Jun 10 '18 at 09:34

7 Answers7

25

The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.

First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.

The reason for letting enums hashCode be implemented as Objects hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.

You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.

Why did they choose to solve it like this?

Well, look at the requirements of the hashCode implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • +1 Good answer, although to a different question. You're telling me a reason, why it's OK to do it this way, but I was asking, why this way was chosen. – maaartinus Feb 03 '11 at 11:10
  • I agree. It's not an easy question to answer. Updated my answer with my best bet. – aioobe Feb 03 '11 at 11:17
  • 1
    "within one JVM there will only be one instance of each enum object" - are you sure? What if the same enum class was loaded by different classloaders? – Grzegorz Oledzki Feb 03 '11 at 11:24
  • 1
    @Oledzki: Good point! But then the enums are not equal anymore, because equals() is implemented in Java 6 by the == operator, thereby adhering to the hashcode contract. – Daniel Feb 03 '11 at 12:14
  • @Daniel: agreed. However the single sentence in aioobe's answer is not true. I more or less agree with the rest. – Grzegorz Oledzki Feb 03 '11 at 22:35
  • 1
    @GrzegorzOledzki, *What if the same enum class was loaded by different classloaders?* -- they are entirely different classes hence enums java.lang.Class has **exactly** the same hashCode/equals, just b/c their names might be the same they are not the same classes. `Class.getName()` is just a human readable name but doesn't hold much for internal representation at all. In short classes from different classes loaders may share names but that's irrelevant. – bestsss Jun 22 '14 at 03:03
  • 1
    *why this way was chosen.* - it's the best impl. there is - a random number with good entropy (can mix different enums), doesn't require extra space and fits perfectly all requirements. Actually anything else would be subpar. On a flip note: enums of the same type may not have the same class as overriding methods results into brand new classes. – bestsss Jun 22 '14 at 03:31
11

I think that the reason they made it final is to avoid developers shooting themselves in the foot by rewriting a suboptimal (or even incorrect) hashCode.

Regarding the chosen implementation: it's not stable across JVMs, but it's very fast, avoid collisions, and doesn't need an additional field in the enum. Given the normally small number of instances of an enum class, and the speed of the equals method, I wouldn't be surprised if the HashMap lookup time was bigger with your algorithm than with the current one, due to its additional complexity.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
2

I've asked the same question, because did not saw this one. Why in Enum hashCode() refers to the Object hashCode() implementaion, instead of ordinal() function?

I encountered it as a sort of a problem, when defining my own hash function, for an Object relying on enum hashCode as one of the composites. When checking a value in a Set of Objects, returned by the function, I checked them in an order, which I would expect it to be the same, since the hashCode I define myself, and so I expect elements to fall at the same nodes on the tree, but since hashCode returned by enum changes from start to start, this assumption was wrong, and test could fail once in a while.

So, when I figured out the problem, I started using ordinal instead. I am not sure everyone writing hashCode for their Object realize this.

So basically, you can't define your own deterministic hashCode, while relying on enum hashCode, and you need to use ordinal instead

P.S. This was too big for a comment :)

Community
  • 1
  • 1
mavarazy
  • 7,562
  • 1
  • 34
  • 60
1

The JVM enforces that for an enum constant, only one object will exist in memory. There is no way that you could end up with two different instance objects of the same enum constant within a single VM, not with reflection, not across the network via serialization/deserialization.

That being said, since it is the only object to represent this constant, it doesn't matter that its hascode is its address since no other object can occupy the same address space at the same time. It is guaranteed to be unique & "deterministic" (in the sense that in the same VM, in memory, all objects will have the same reference, no matter what it is).

pnt
  • 1,916
  • 1
  • 20
  • 29
  • 4
    The same comment as to aioobe's answer. You wrote: "The JVM enforces that for an enum constant, only one object will exist in memory", which doesn't seem true when the same enum class is loaded by different classloaders. – Grzegorz Oledzki Feb 03 '11 at 11:29
  • You're wrong. The number of objects may exceed 2**32, so the hashCode can't be unique. – maaartinus Feb 03 '11 at 11:37
  • @Grzegorz: As far as i'm aware, once a class is loaded by a class loader and, within the same application, another one tries to look for the class, he'll just use the already loaded one and not load it again. @maaartinus: I agree, but that is a flaw in hashcode in general. One that cannot avoided in the standard implementations of the structures since they use `hashCode()`. This is somehow mitigated by the fact that 2^32 objects or more will rarely reside in memory, in a single hashmap, i guess. Custom implementations that rely on external hashing solve this by using longer data types. – pnt Feb 03 '11 at 19:39
  • 3
    That is true only if the same classloader tries to load the class again. But you might have multiple (independent) classloaders in the application and one might not be aware of another. – Grzegorz Oledzki Feb 03 '11 at 22:33
  • 1
    I have enounted two different enum instances in my application, so I can affirm that the above comment regarding classloaders is correct. – JohnyTex Feb 18 '16 at 09:06
0

One more reason that it is implemented like this I could imagine is because of the requirement for hashCode() and equals() to be consistent, and for the design goal of Enums that they sould be simple to use and compile-time constant (to use them is "case" constants). This also makes it legal to compare enum instances with "==", and you simply wouldn't want "equals" to behave differntly from "==" for enums. This again ties hashCode to the default Object.hashCode() reference-based behavior. As said before, I also don't expect equals() and hashCode() to consider two enum constants from different JVM as being equal. When talking about serialization: For instance fields typed as enums the default binary serializer in Java has a special behaviour that serializess only the name of the constant, and on deserialization the reference to the corresponding enum value in the de-serializing JVM is re-created. JAXB and other XML-based serialization mechanisms work in a similar way. So: just don't worry

Mirko Klemm
  • 2,048
  • 1
  • 23
  • 22
0

As long as we can't send an enum object1 to a different JVM I see no reason for putting such a requirements on enums (and objects in general)


1 I thought it was clear enough - an object is an instance of a class. A serialized object is a sequence of bytes, usually stored in a byte array. I was talking about an object.

Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • 2
    We can send an object to a different JVM by serialization. You can also create the same object there. But you can't expect the same hashCode, which is an unnecessary non-determinism. – maaartinus Feb 03 '11 at 11:01
  • @maaartinus - serialization doesn't send *the* object but a some data that can be used to recreate some more or equivalent object. We can't, and that was the point, compare two objects from different jvms with `obj1.equals(obj2)`. – Andreas Dolk Feb 03 '11 at 11:48
  • That's true. But I didn't say we could. I know that hashCode defined this way obeys the contract, this wasn't my point. – maaartinus Feb 03 '11 at 11:55
0

There is no requirement for hash codes to be deterministic between JVMs and no advantage gained if they were. If you are relying on this fact you are using them wrong.

As only one instance of each enum value exists, Object.hashcode() is guaranteed never to collide, is good code reuse and is very fast.

If equality is defined by identity, then Object.hashcode() will always give the best performance.

The determinism of other hash codes is just a side effect of their implementation. As their equality is usually defined by field values, mixing in non-deterministic values would be a waste of time.

OrangeDog
  • 36,653
  • 12
  • 122
  • 207