38

I have the following code:

class C
{
    String n;

    C(String n)
    {
        this.n = n;
    }

    public String getN() { return n; }

    @Override
    public boolean equals(Object obj)
    {
        return this.getN().equals(((C)obj).getN());
    }
 }

List<C> cc = Arrays.asList(new C("ONE"), new C("TWO"), new C("ONE"));

System.out.println(cc.parallelStream().distinct().count());

but I don't understand why distinct returns 3 and not 2.

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
xdevel2000
  • 20,780
  • 41
  • 129
  • 196
  • 6
    Aha, you're experimenting with Java 8. Try also overriding `hashCode()` in class `C`. If two `C` objects are equal, then their hash codes must be the same. – Jesper Jan 24 '14 at 13:17
  • 2
    Put a breakpoint inside overrided `equals` and see if `distinct` regards it. – Yasser Zamani Jan 24 '14 at 13:27
  • 2
    @Jesper, I did not see any worry about `hashcode` in documents at http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html#distinct-- – Yasser Zamani Jan 24 '14 at 13:28
  • 5
    But it seems to operate initially on `hashcode`, since that most often is more efficient than the `equals` checks. And since it is a general contract, that two equal objects _must_ have the same hash. Therefore it is a valid solution to only perform the `equals` check when the `hashcode`s are the same. And I believe that's exactly what `distinct` does – ksmonkey123 Apr 25 '15 at 08:22
  • *but I don't understand why `distinct` returns 3 and not 2.*, `distinct` should be `count`. – Jason Law Jan 29 '20 at 14:27

1 Answers1

56

You need to also override the hashCode method in class C. For example:

@Override
public int hashCode() {
    return n.hashCode();
}

When two C objects are equal, their hashCode methods must return the same value.

The API documentation for interface Stream does not mention this, but it's well-known that if you override equals, you should also override hashCode. The API documentation for Object.equals() mentions this:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Apparently, Stream.distinct() indeed uses the hash code of the objects, because when you implement it like I showed above, you get the expected result: 2.

Jesper
  • 202,709
  • 46
  • 318
  • 350
  • 1
    Likely internally the stream implementation is using HashSet which makes use of the `hashCode` method. – GameSalutes Jun 27 '18 at 16:44
  • 2
    +1 the rule you mention is very important, don't override equals or hashcode unless you override both. This should probably cause a compile-time error. – Bill K May 09 '19 at 17:11