5

I want two instances of the Pair class to be treated as equal if "first" and "second" are swapped, i.e. Pair(1,2) == Pair(2,1) should evaluate true.

    static class Pair {
        int first;
        int second;

        Pair(int first, int second) {
            this.first = first;
            this.second = second;
        }

        @Override
        public boolean equals(Object obj) {
            if (obj == this) {
                return true;
            }
            if (obj == null || obj.getClass() != this.getClass()) {
                return false;
            }

            Pair that = (Pair) obj;
            return (this.first == that.first && this.second == that.second)
                    || (this.second == that.first && this.first == that.second);
        }

        @Override
        public int hashCode() {
            return Objects.hash(first, second) + Objects.hash(second, first);

        }
    }

I have come up with this hashcode(), and it seems to work, but is there an better / generally accepted way of doing this?

James
  • 51
  • 2
  • 4
    Any commutative operation combining the hashes of the two objects will work, e.g. `first.hashCode() ^ second.hashCode()`. Invertible operations like `^` and `+` will be better than operations like `&` and `|`, because the latter don't preserve uniformity. – kaya3 Feb 07 '20 at 12:26
  • 2
    Also keep in mind that you could just change the constructor so that `this.first = Math.min(first, second);` and `this.second = Math.max(first, second);`. That way your `equals` and `hashCode` methods can be much simpler. – kaya3 Feb 07 '20 at 12:32

1 Answers1

4

A simpler hash code without overhead of Objects.hash() would be:

@Override
public int hashCode() {
  return first + second;
}

It's similar to what Integer.hashCode() which returns the Integer value.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
  • 2 + 1 and 3 + 0 would be the same, but are different – FazoM Feb 07 '20 at 12:31
  • 3
    @FazoM `hashCode()` is not an equality check, that's `equal()` job. These methods have contract and hash codes being equal does not mean object equality. – Karol Dowbecki Feb 07 '20 at 12:33
  • 2
    `first ^ second` would do slightly better, losing one overflow bit less on information. However 2^1 just happens equal to 3^0. But not 2^4 and 3^3. – Joop Eggen Feb 07 '20 at 12:35
  • OK, but not equal objects should produce different hash codes, otherwise you couldn't put them together in HashSet – FazoM Feb 07 '20 at 12:36
  • But in this case is there a problem that the hashcodes for Pair(1,1) and Pair(2,0) will be identical? Seems like it's ok actually, from the docs... It is not required that if two objects are unequal * according to the {@link java.lang.Object#equals(java.lang.Object)} * method, then calling the {@code hashCode} method on each of the * two objects must produce distinct integer results. However, the * programmer should be aware that producing distinct integer results * for unequal objects may improve the performance of hash tables. – James Feb 07 '20 at 12:36
  • If you think it is not a problem just do: return 0; and move on xD – FazoM Feb 07 '20 at 12:36
  • @FazoM you are mistaken. Try it! Write a class whose hashCode impl is just 'return 1;', but with a proper equals. It'll work just fine with a hashset. The only thing you lose if some (or all) of the objects inside return the same hashcode, is efficiency. If you can make this hashCode algorithm avoid clashes more, great! But without stating more info about what kind of numbers are likely here, 3/0 and 2/1 are as likely as any other pair, and given that only 2^32 hashcodes exist for 2^63 different possible Pair instances, clashes are mathematically impossible to avoid. – rzwitserloot Feb 07 '20 at 12:41
  • The hashCode in the original question would give something like 31² + 32*(fist+second) which is even worse. – Joop Eggen Feb 07 '20 at 12:42
  • *"The only thing you lose ... is efficiency"* Yes, but the only thing you **gain** by using a HashSet instead of an ArrayList is efficiency (for certain operations like `contains`). So the fact that you are overriding `hashCode` for use in a HashSet means that efficiency is a concern. If your hashes are all the same, then a HashSet is just a slower ArrayList without indices. – kaya3 Feb 07 '20 at 12:42
  • 2
    @FazoM read https://vanilla-java.github.io/2018/08/15/Looking-at-randomness-and-performance-for-hash-codes.html – Karol Dowbecki Feb 07 '20 at 12:49
  • 1
    @JoopEggen it’s funny how you contradict yourself in the same comment by bringing counter examples. No, `first ^ second` is not better, it’s *worse* than `first + second`. – Holger Feb 14 '20 at 14:33
  • Hi @Holger 3+5 mixes up bits (as does xoring) but adds a most-significant bit for 8. I think you find that slightly better. However it might lose a bit information due to sign overflow, so I find ^ very very itsy bitsy better. _I should not have brought up the argument._ As 3*5 it is more or less irrelevant. Or can you explain? Often you were able to correct me and enlighten me positively. – Joop Eggen Feb 14 '20 at 14:47
  • 1
    @JoopEggen how does “sign overflow” affect the quality of the result? Real life hash maps use a power of two for the table size, which will just use the lower bits, which are unaffected by the overflow (for other sizes, a correct modulo would still work). For one bit, it’s obvious, plus and xor have the same result. For two bits, the lower bit may produce a carry bit which flips the upper bit. Since this happens for each combination of upper bits, the distribution of the result values does not change. The upper bit’s carry bit gets dropped, just like xor doesn’t generate one in the first place – Holger Feb 14 '20 at 15:06
  • 2
    @JoopEggen So for an equal distribution of both inputs over the entire value space, both produce an equal result distribution over the value space. It’s easy to show that this applies to all bit sizes. But in practice, smaller numbers are more likely than larger numbers and application domains have ranges which are not a power of two. In these cases (the relevant cases), xor doesn’t use the upper bits at all, which makes it worse than plus. – Holger Feb 14 '20 at 15:07
  • @Holger, one bit though (factor 2). Multiplication `*` smears wider, but has also its mathematical problems. But okay, you are right: **+ has in general has a slightly better behavior than ^**. – Joop Eggen Feb 14 '20 at 15:17