10

This is what i am doing:

List scores = Stream.concat(oldEntries.stream(), newEntries.stream())
                    .sorted()
                    .distinct()
                    .limit(maxSize)
                    .collect(Collectors.toList());

I am expecting a sorted list without any duplicates, but some times there is duplicate in the list.

I have override the hashCode and equals method, I have also observed that these methods are returning the correct value every time. Can any one see what is wrong with my stream?

This is my equals() and hashCode() They are auto generated by IDEA :

..
private int userId;
private int levelId;
private int score;

@Override
public boolean equals(Object o) {

    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    Score score = (Score) o;

    if (userId != score.userId) return false;
    return levelId == score.levelId;

}

@Override
public int hashCode() {
    int result = userId;
    result = 31 * result + levelId;
    return result;
}

public int compareTo(Score other) {

    if (other == null) {
        return 1;
    } else {
        return Integer.compare(other.score, this.score);
    }
}

 ..
Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
MoienGK
  • 4,544
  • 11
  • 58
  • 92

2 Answers2

10

Your stream is first ordered according to compareTo, i.e. using score.

It's then "distinctified" using equals(), i.e. using userId and levelId. According to the javadoc:

For ordered streams, the selection of distinct elements is stable (for duplicated elements, the element appearing first in the encounter order is preserved.) For unordered streams, no stability guarantees are made.

Example:

score 1, user 2, level 3
score 3, user 2, level 3
score 1, user 3, level 1

After sorting...

score 1, user 2, level 3
score 1, user 3, level 1
score 3, user 2, level 3

Distinct now does nothing, because the elements are not equal according to user/level. This can result in "duplicate" elements, because you're ordering the stream based on one thing, but determining equality by an entirely different thing.

Kayaman
  • 72,141
  • 5
  • 83
  • 121
  • Is there any documentation of `distinct()` which covers about the `following equals elements are dropped` part? – ByeBye Sep 12 '17 at 13:25
  • You said :"Distinct now does nothing, because the elements are not equal according to user/level." but the elements are equal according to user/level. I am still confused. if distinct is using equals, then it should be able to find the two equal elements – MoienGK Sep 12 '17 at 13:36
  • 3
    @MoienGK since the stream is a sorted stream, it's assumed that equal elements follow each other. In the example shown this is not the case, and distinct fails to remove the row with score 3, because it's not equal to the row **before** it. – Kayaman Sep 12 '17 at 13:39
  • Kayaman is right, it seems that for ordered stream distinct use some acceleration to computing and not checking all elements but in pairs. Unfortunately it is not described properly in javadoc of `distinct()` – ByeBye Sep 12 '17 at 13:44
  • 3
    @ByeBye what do you mean? It's described quite clearly. – Kayaman Sep 12 '17 at 13:45
  • @Kayaman This is my usecase and I can not change it. is there any other workaround for distinct using streams? – MoienGK Sep 12 '17 at 13:45
  • 1
    @MoienGK use distinct first, than sort – ByeBye Sep 12 '17 at 13:46
  • 1
    @Kayaman `(for duplicated elements, the element appearing first in the encounter order is preserved.)` for me it means, that always FIRST element will be preserved, not that it will be checking equality with it neighbours, when in unordered stream it can be random – ByeBye Sep 12 '17 at 13:47
  • 1
    @ByeBye thats the point, implementations are free to do this small improvements *assuming* your contracts are correct as far as I can see. – Eugene Sep 12 '17 at 19:51
  • 2
    The effectiveness of the optimization is demonstrated in [this answer](https://stackoverflow.com/a/43807329/2711488), but note that such problems with a natural order that is inconsistent with `equals` may occur even without that optimization, e.g. when using `distinct()` without `sort()`, as the `HashMap` used behind the scenes uses the natural order to resolve hash collisions ([see here](https://stackoverflow.com/q/35164645/2711488))… – Holger Sep 12 '17 at 21:40
4

It is a bug.

The documentation of Stream.distinct() simply says:

Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.

For ordered streams, the selection of distinct elements is stable (for duplicated elements, the element appearing first in the encounter order is preserved.) For unordered streams, no stability guarantees are made.

There is no requirement that for ordered streams the equal objects should come right after each other (consecutively). However the implementation seems to assume they do. What the documentation means is that the first occurrence of user 2, level 3 should be preserved and the second occurrence discarded.

According to the Java bug database the bug exists up to Java 13 and remains unresolved.

Links

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • 2
    Good find. It's a catch-22 since as an implementation detail it can't *really* be solved by documenting that `hashCode/equals/compareTo` must be consistent. – Kayaman Apr 25 '21 at 08:27