57

This is odd. A co-worker asked about the implementation of myArray.hashCode() in java. I thought I knew but then I ran a few tests. Check the code below. The odd thing I noticed is that when I wrote the first sys out the results were different. Note that it's almost like it's reporting a memory address and modifying the class moved the address or something. Just thought I would share.

int[] foo = new int[100000];
java.util.Random rand = new java.util.Random();

for(int a = 0; a < foo.length; a++) foo[a] = rand.nextInt();

int[] bar = new int[100000];
int[] baz = new int[100000];
int[] bax = new int[100000];
for(int a = 0; a < foo.length; a++) bar[a] = baz[a] = bax[a] = foo[a];

System.out.println(foo.hashCode() + " ----- " + bar.hashCode() + " ----- " + baz.hashCode() +  " ----- " + bax.hashCode());

// returns 4097744 ----- 328041 ----- 2083945 ----- 2438296
// Consistently unless you modify the class.  Very weird
// Before adding the comments below it returned this:
// 4177328 ----- 4097744 ----- 328041 ----- 2083945


System.out.println("Equal ?? " +
  (java.util.Arrays.equals(foo, bar) && java.util.Arrays.equals(bar, baz) &&
  java.util.Arrays.equals(baz, bax) && java.util.Arrays.equals(foo, bax)));
Gunnar
  • 2,264
  • 17
  • 31
souLTower
  • 656
  • 1
  • 5
  • 5

4 Answers4

103

The java.lang.Array hashCode method is inherited from Object, which means the hashcode depends on the reference. To get the hashcode based on the content of the array use Arrays.hashCode.

Beware though its a shallow hashcode implementation. A deep implementation is also present Arrays.deepHashCode.

Tom
  • 15,798
  • 4
  • 37
  • 48
MahdeTo
  • 11,034
  • 2
  • 27
  • 28
  • 1
    Thanks for this answer but why does java.lang.Array does not override hashCode (and toString) methods by default? Is there any good reason? – Krzysztof Kaczor May 07 '13 at 21:05
  • 4
    Because hashCode needs to be fast to be useful (as it is mostly used to prevent an expensive call of .equals), and even a shallow value hashCode on an array could potentially be very slow. A hashCode that is basically random does not hurt, it just provides no advantage. Lesser of two evils. – Torque Nov 28 '13 at 03:31
  • @Torque It only doesn't hurt if equals() is crappy in the same way. Normally a hashCode that is 'basically random' would be a serious problem, because if equals is true then hashCode must be same. A constant would be better than random. – Mark Nov 10 '20 at 12:29
  • I wrote the comment a long time ago, so I can't say what I was thinking at the time, but I was probably referring to the system hashcode, which will return a number not based on the objects fields (so 'essentially random'), but which of course stays the same for the object between calls. I obviously worded the comment poorly. – Torque Nov 11 '20 at 12:39
  • 1
    @Mark Yet .equals() on arrays is "crappy in the same way": `bar.equals(baz)` uses reference equality. The problem happens when people use `Arrays.equals(foo, bar)` to determine equality and then _don't_ use `Arrays.hashCode(foo)` for hashCode. If you're using methods in `java.util.Arrays` for equality, you need to use them for hashCode too. – Daniel Martin Mar 24 '22 at 13:57
6

Arrays use the default hash code, which is based on memory location (but it isn't necessarily the memory location, since it's only an int and all memory addresses won't fit). You can see this by also printing the result of System.identityHashCode(foo).

Arrays are only equal if they are the same, identical array. So, array hash codes will only be equal, generally, if they are the same, identical array.

erickson
  • 265,237
  • 58
  • 395
  • 493
1

The default implementation for Object.hashCode() is indeed to return the pointer value of the object, although this is implementation dependent. For instance, a 64-bit JVM may take the pointer and XOR and high and low order words together. Subclasses are encouraged to override this behavior if it makes sense.

However, it does not make sense to perform equality comparisons on mutatable arrays. If an element changes, then the two are no longer equal. To maintain the invariant that the same array will always return the same hashCode no matter what happens to its elements, arrays do not override the default hashcode behavior.

Note that java.util.Arrays provides a deepHashCode() implementation for when hashing based on the contents of the array, rather than the identity of the array itself, is important.

James
  • 2,050
  • 13
  • 15
0

I agree with using java.util.Arrays.hashCode (or the google guava generic wrapper Objects.hashcode) but be aware that this can cause issues if you are using Terracotta - see this link

Carl Pritchett
  • 674
  • 6
  • 13