10

I'm overriding the equals and hashcode methods for a simple container object for two ints. Each int reflects the index of another object (it doesn't matter what that object is). The point of the class is to represent a connection between the two objects.

The direction of the connection doesn't matter, therefore the equals method should return true regardless of which way round the two ints are in the object E.g.

connectionA = new Connection(1,2);
connectionB = new Connection(1,3);
connectionC = new Connection(2,1);

connectionA.equals(connectionB); // returns false
connectionA.equals(connectionC); // returns true

Here is what I have (modified from the source code for Integer):

public class Connection {
    // Simple container for two numbers which are connected.
    // Two Connection objects are equal regardless of the order of from and to.

    int from;
    int to;

    public Connection(int from, int to) {
        this.from = from;
        this.to = to;
    }

    // Modifed from Integer source code
    @Override
    public boolean equals(Object obj) {
        if (obj instanceof Connection) {
            Connection connectionObj = (Connection) obj;
            return ((from == connectionObj.from && to == connectionObj.to) || (from == connectionObj.to && to == connectionObj.from));
        }
        return false;
    }

    @Override
    public int hashCode() {
        return from*to;
    }
}

This does work however my question is: Is there a better way to achieve this?

My main worry is with the hashcode() method will return the same hashcode for any two integers which multiply to equal the same number. E.g.

3*4 = 12
2*6 = 12 // same!

The documentation, http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode(), states that

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

If anyone can see a simple way of reducing the number of matching hashcodes then I would be appreciative of an answer.

Thanks!

Tim

PS I'm aware that there is a java.sql.Connection which could cause some import annoyances. The object actually has a more specific name in my application but for brevity I shortened it to Connection here.

Twice Circled
  • 1,410
  • 1
  • 15
  • 23

6 Answers6

6

Three solutions that would "work" have been proposed. (By work, I mean that they satisfy the basic requirement of a hashcode ... that different inputs give different outputs ... and they also satisfy the OP's additional "symmetry" requirement.)

These are:

   # 1
   return from ^ to;

   # 2
   return to*to+from*from;

   # 3
   int res = 17;
   res = res * 31 + Math.min(from, to);
   res = res * 31 + Math.max(from, to);
   return res;

The first one has the problem that the range of the output is bounded by the range of the actual input values. So for instance if we assume that the inputs are both non-negative numbers less or equal to 2i and 2j respectively, then the output will be less or equal to 2max(i,j). That is likely to give you poor "dispersion"1 in your hash table ... and a higher rate of collisions. (There is also a problem when from == to!)

The second and third ones are better than the first, but you are still liable to get more collisions than is desirable if from and to are small.


I would suggest a 4th alternative if it is critical that you minimize collisions for small values of from and to.

  #4
  int res = Math.max(from, to);
  res = (res << 16) | (res >>> 16);  // exchange top and bottom 16 bits.
  res = res ^ Math.min(from, to);
  return res;

This has the advantage that if from and to are both in the range 0..216-1, you get a unique hashcode for each distinct (unordered) pair.


1 - I don't know if this is the correct technical term for this ...

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • One slight problem with your #4 is that some hash tables may map hash codes to slot in such a way as to undo the work you did separating the values. I'd suggest computing `bigprime1*(from+to)-bigprime2*min(from,to)`. No need to compute both max and min, since sum-max=min. XOR seems popular in hashing, but in a language with defined overflow semantics I don't know that it has any meaningful advantages over addition. – supercat Jan 15 '14 at 04:45
5

This is widely accepted approach:

@Override
public int hashCode() {
    int res = 17;
    res = res * 31 + Math.min(from, to);
    res = res * 31 + Math.max(from, to);
    return res;
}
Nikolay Kuznetsov
  • 9,467
  • 12
  • 55
  • 101
2

i think, something like

@Override
public int hashCode() {
    return to*to+from*from;
}

is good enough

infthi
  • 537
  • 1
  • 5
  • 19
1

Typically I use XOR for hashcode method.

@Override
public int hashCode() {
    return from ^ to;
}
Jayamohan
  • 12,734
  • 2
  • 27
  • 41
0

I wonder why nobody offered the usually best solution: Normalize your data:

 Connection(int from, int to) {
      this.from = Math.min(from, to);
      this.to = Math.max(from, to);
 }

If it's impossible, then I'd suggest something like

 27644437 * (from+to) + Math.min(from, to)
  • By a using a multiplier different from 31, you avoid collisions like in this question.
  • By using a big multiplier you spread the numbers better.
  • By using an odd multiplier you ensure that the multiplication is bijective (i.e., no information gets lost).

  • By using a prime you gain nothing at all, but everyone does it and it has no disadvantage.

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • It may also be worth noting that your second indicated structure is bijective with respect to x for all pairs of the form `(x, x)` and is nearly bijective for pairs of the form `(x,x+delta)`. By contrast, forms that multiply one value by 31 and add the other will always yield a result that's a multiple of 32 when used with two equal values, 64 when used with four, 128 when used with eight, 256 when used with 16, etc. – supercat Aug 05 '14 at 16:04
  • @supercat `27644437 * x + x = 0x1A5D216 * x` which is always even, so it can't be bijective. But it's as good as it can as it's no multiple of 4. I've been always claiming that 31 is a terrible multiplier while I didn't think about `hash(x, x)` being a multiple of 32. – maaartinus Aug 05 '14 at 17:33
  • It ends up being 27644437*(x+x)+x, which is odd. As for the merits of 31, it wouldn't be so bad if each step computed 31*prev-x rather than 31*prev+x. The would of course make every (x,-x) value hash to a multiple of 32, but those are probably much less common that (x,x). What's really horrible is x^y. That maps every (x,x) to zero, and is almost as bad with (x,x+1), mapping half of those to 1, a quarter to 3, an eighth to 7, etc. What's absurd is that x+y is generally no more expensive than x^y, but people have some odd attachment to xor. BTW, `from+to` might be a decent hash. – supercat Aug 05 '14 at 17:45
  • @supercat I see; I was thinking about the 27644437 multiplier rather than about my above formula. Agreed, replacing addition by subtraction would be better; probably similar to replacing 31 by 33. Agreed again, using XOR is only good if you know what you're doing. Using it as in [Map.Entry](http://docs.oracle.com/javase/7/docs/api/java/util/Map.Entry.html#hashCode()) is hard to beat, but an even more stupid example is [BitSet](http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#hashCode()). The latter even ignores some bits completelly when `i+1` is a power of two. – maaartinus Aug 06 '14 at 04:57
  • I think the `Map.Entry` is worse; since `Set` provides no way of retrieving the *particular* `Value` object associated with a key, it's often useful to create a map where *every* item's key and value are identical. A hash function which expends effort on every item but always ends up returning zero is probably the worst *legitimate* hash function possible. – supercat Aug 06 '14 at 15:09
  • @supercat Right, and sometimes you need its `EntrySet` and you're toasted. One solution would be wrapping the value into something multiplying the hashCode by 2 or 3 (I guess both `x^(2*x)` and `x^(3*x)` behave rather sane; whatever you do, the LSb of something will be constant). Agreed that `BitSet#hashCode` is mostly harmless as nobody really needs it. – maaartinus Aug 06 '14 at 15:18
  • 1
    I wouldn't say BitSet#hashCode is harmless because nobody needs it, but rather that would seem unlikely to result in massively degenerate behavior. A million-item hash table in which every single item collides with exactly one other should be considered well-hashed even though it has a "100% collision rate". One where a third of the items all yield the same hash value would be poorly hashed, even if all other items had distinct hash values (and it thus had 666,667 distinct values as compared with the 500,000 distinct values in the first table). – supercat Aug 06 '14 at 15:37
  • 1
    I would suspect that in typical usage, something like `BitSet@hashCode` would have many more collisions than would a slightly-tweaked version of the algorithm, but I wouldn't expect collisions to form large clusters the way xor may do in many real-world situations, nor to yield *completely* degenerate behavior as would occur with a table that mapped each item to an equal item. – supercat Aug 06 '14 at 15:43
0

Java 1.7+ have Objects.hash

@Override
public int hashCode() {
    return Objects.hash(from, to);
}
answer42
  • 65
  • 5