Java positive/negative zero and hash tables

Question

The javadocs state that

Note that in most cases, for two instances of class Double, d1 and d2, the value of d1.equals(d2) is true if and only if
    d1.doubleValue() == d2.doubleValue()
also has the value true. However, there are two exceptions:

If d1 and d2 both represent Double.NaN, then the equals method returns true, even though Double.NaN==Double.NaN has the value false.

If d1 represents +0.0 while d2 represents -0.0, or vice versa, the equal test has the value false, even though +0.0==-0.0 has the value true.

This definition allows hash tables to operate properly.

Why are these two exceptions necessary to "allow hash tables to operate properly"?

Equivalently: how would hash tables fail to work with a different definition of Double?

score 4 · Accepted Answer · edited May 23 '17 at 12:09

Why are these two exceptions necessary to "allow hash tables to operate properly"?

In fact, that statement is a bit misleading. It would be more accurate to say that those exceptions in the definition of Double.equals() are necessary for the chosen implementation of Double.hashCode() to be consistent with equals(). That characteristic is indeed relevant to the Java platform library's hash implementations. You'll find a great deal of verbiage devoted to that topic generally, both on SO and elsewhere. For example:

Since the general topic of hashCode() / equals() consistency is so well covered, I'll focus on how they apply to class Double. There are several details in this area that it is necessary to understand:

Java Double is the wrapper class for double, and double is defined in terms of IEEE-754 binary double precision format.
In IEEE-754, positive and negative zero are distinct values. Even though they compare equal to each other, they have different bit patterns, and they are distinguishable by some of their other properties. This is useful and desirable for some purposes.
On the other hand, although IEEE-754 defines several "not a number" (NaN) bit patterns, Java uses only one of them.
IEEE-754 specifies that its special NaN values compare unequal to every value, including themselves. This is one of their distinguishing features.
Double.hashCode() is defined in terms of arithmetic operations on the bit pattern of the wrapped double.

Because Double.hashCode() is computed from the wrapped double's bit pattern, and the bit patterns of positive and negative zero differ, the hash codes of Double(+0.0) and Double(-0.0) differ. That would make this hashCode() implementation inconsistent with equals() if two such Double instances compared equal. Therefore, Double.equals() is defined so that they do not compare equal. That was not the only alternative: hashCode() could have instead been defined so that the two flavors of zero had the same hash code.

On the flip side, because Java provides only one NaN value of type double, with its specific bit pattern, Double instances representing that value yield the same hash code. Although again this follows from the chosen implementation of hashCode(), there would be no easy way to mirror IEEE-754 equality semantics in class Double, because doing so would need to violate an even more important invariant than equals() being consistent with hashCode(): equals being reflexive. That is, it is expected always to be the case that for any non-null reference a, a.equals(a) is true.

Do you know of a reason that `hashCode()` isn't defined in that way? In particular couldn't `hashCode()` just have an explicit check for signed zeros instead? — Alexander Katz, Mar 13 '17 at 17:50
(Or maybe more simply write `doubleToLongBits()` to return the same value for both signed zeros as @RealSkeptic suggests) — Alexander Katz, Mar 13 '17 at 18:00
@AlexanderKatz, `doubleToLongBits()` *must not* return the same value for the two differently signed `double` zeros, because that's inconsistent with the definition and purpose of that public method. More generally, I am not privy to the designers' thoughts, but their design feels very natural to me -- objects' `equals()` methods in general implement more of an *equivalence* than an arithmetic equality. Furthermore, it is desirable to be able to store both positive and negative zero in the same set, and to use both as keys to the same map. — John Bollinger, Mar 13 '17 at 18:37
Why doesn't that restriction on `doubleToLongBits()` apply to NaNs? Is there a particular use case in which both signed zeros would be used as keys in the same hash? — Alexander Katz, Mar 13 '17 at 22:42
@AlexanderKatz, because historically, Java made no distinction at all, for any purpose, between different `NaN`s. There was no way to tell them apart until `Double.toRawLongBits()` was added in v1.3, which these days you might retronym as Java 3. This was well after the design for `equals()` and `hashCode()` was set, and even now it's pretty reasonable. — John Bollinger, Mar 14 '17 at 14:09
As for using both signed zeroes in the same hash, here's a contrived example: consider a hash that maps the function *y* = 1 / *x*. Each key represents an *x*, and the corresponding value represents the corresponding *y*. Since 1 / *x* tends to positive infinity when *x* approaches 0 from the right, and to negative infinity when *x* approaches zero from the left, it would be plausible to map +0.0 to positive infinity, and -0.0 to negative infinity. — John Bollinger, Mar 14 '17 at 14:13
@AlexanderKatz, in any case, you're asking the wrong question. If you accept that it is useful (at least to some people) to have distinct positive and negative floating-point zeroes, then it should be possible to enroll both in the same hash *because they are distinct*. That they have the peculiar property of being equal (in the `==` sense) despite being distinct is irrelevant. — John Bollinger, Mar 14 '17 at 14:23

M. Justin · Answer 2 · 2017-03-13T17:07:43.750

The first bullet point is necessary to meet the contract for equals and hash code (which is what HashTable/HashMap uses). Specifically, any object must be equal to itself, per the equals Javadoc:

The equals method implements an equivalence relation on non-null object references:

It is reflexive: for any non-null reference value x, x.equals(x) should return true.

Since a hash code is required to be consistent with the behavior of the equals method, this applies to hashCode as well.

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

Therefore, at a minimum, the same NaN Double object must be equal to itself. Technically, it wouldn't be required that different instances of Double with the NaN value must be equal to each other. My guess is they made all NaN values equal because it would be confusing if the weren't, and inconsistent with the fact that Double is basically a value type, so identity equality would be inappropriate.

I can't think of a reason why the second exception is necessary to allow hash tables to operate properly, however.

score 1 · Answer 3 · answered Mar 13 '17 at 16:19

1

These exceptions are necessary because the action of Double as a key differs from its action in calculations. The bit pattern for plus and minus zero differs, so for key purposes they cannot be equal, at least not easily. The bit pattern for NaN is just the one pattern, so for key purposes it cannot fail to be equal to itself.

answered Mar 13 '17 at 16:19

Lew Bloch

3,364
1
16
10

Why? What does the bit pattern matter? You can declare any equivalence relation and it will work in a hash table. For example, if you define strings to be equal IFF their lowercase representation is equal, you can use them for hash tables perfectly well despite their bit contents being different. – RealSkeptic Mar 13 '17 at 16:22
Wouldn't the bit pattern be important since the implementation of `hashCode()` uses the value of `doubleToLongBits()` to generate as `(int)(v^(v>>>32))`? – azurefrog Mar 13 '17 at 16:42
@azurefrog They probably matched the `hashCode` to the `equals`. But they could define `hashCode` *and* `equals` differently. – RealSkeptic Mar 13 '17 at 16:44
1

Also, there are many bit patterns for NaN. It just happens that they wrote the `doubleToLongBits` method so that it returns the same value for all NaNs. Which means that the actual bit pattern of the value has no meaning at all. – RealSkeptic Mar 13 '17 at 16:50
Doesn't Java use just one bit pattern for a canonical `NaN`? In any case, _as a key_ it makes sense for `NaN` to be `equals` to `NaN`. Also, the primitive wrappers all differ in semantics from their corresponding primitives. For one thing, primitives don't have `equals` or `hashCode` methods, so the wrappers have no way to be consistent with such methods. – Lew Bloch Mar 13 '17 at 18:50

Java positive/negative zero and hash tables

3 Answers3