1

I was testing some things with Java Stream and distinct() and I came across a scenario where if I change a object of the Stream after executing distinct(), the final result of the execution contains duplicate items.

Does it make sense the object to go to the next operation before the distinct() is finalized? How to ensure uniqueness without iterating through the entire list?

OBS: The Lombok: @Data annotation adds @EqualsAndHashCode in the Dto class which will automatically generate equals() and hashCode() methods!

package br.com.marcusvoltolim;

import lombok.AllArgsConstructor;
import lombok.Data;
import reactor.core.publisher.Flux;

import java.util.*;
import java.util.stream.Collectors;

@Data
@AllArgsConstructor
class Dto {

    private Long id;

}

public class Main {

    public static void main(String[] args) {
        Dto dto0 = new Dto(0L);
        Dto dto1 = new Dto(1L);
        Dto dto2 = new Dto(1L);

        List<Dto> list = Arrays.asList(dto0, dto1, dto2);

        System.out.println("Original list: " + list);
        System.out.println("List with only distinct: " + list.stream().distinct().collect(Collectors.toList()));

        List<Dto> streamList = list.stream()
            .distinct()
            .map(dto -> {
                if (dto.getId() == 1L) {
                    dto.setId(3L);
                }
                return dto;
            })
            .collect(Collectors.toList());

        System.out.println("Java Stream  with map after distinct: " + streamList);
    }

}

Result:

Original list: [Dto(id=0), Dto(id=1), Dto(id=1)]
List with only distinct: [Dto(id=0), Dto(id=1)]
Java Stream with map after distinct: [Dto(id=0), Dto(id=3), Dto(id=3)]

I expected the result: [Dto(id=0), Dto(id=3)]

Martin Tarjányi
  • 8,863
  • 2
  • 31
  • 49
Marcus Voltolim
  • 413
  • 4
  • 12
  • 2
    You need to implement `.equals(...)` method in your class. - Does this answer your question? [Java (Equals Method)](https://stackoverflow.com/questions/34411932/java-equals-method) – blurfus Feb 17 '23 at 21:36
  • @blurfus Lombok Annotation: `@Data` make this. – Marcus Voltolim Feb 17 '23 at 21:41
  • 2
    All set operations are reliant on hashCode values never changing. See [here](https://stackoverflow.com/questions/11807233/does-default-hashcode-change-on-object-mutation). If they do, expect sets not to work properly. – DuncG Feb 17 '23 at 21:45
  • @DuncG, I understand, but if distinct() generates a new distinct Stream, how would a subsequent operation influence the result? – Marcus Voltolim Feb 17 '23 at 21:49
  • 1
    The map step changes the second entry in the set, such that the 3rd element is therefore now distinct to the first 2. Hence 3 entries. Without the map step, 3rd element is duplicate – DuncG Feb 17 '23 at 21:56
  • Exactly, but the map is executed after the distinct() which in theory should generate a stream with unique elements, right? The JavaDoc: Returns a "new" stream consisting of the distinct elements (according to Object.equals(Object)) of this stream. – Marcus Voltolim Feb 17 '23 at 21:58
  • No, you are seeing that distinct #n is before map #n, not that distinct #1 2 3 is evaluated before map #1 2 3 – DuncG Feb 17 '23 at 22:03
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251971/discussion-between-marcus-voltolim-and-duncg). – Marcus Voltolim Feb 17 '23 at 22:51
  • @DunG, Now, I understand: distinct() uses a HashSet with the hashCode of the objects to optimize the operation. – Marcus Voltolim Feb 17 '23 at 22:57

1 Answers1

4

To get your "expected" behaviour you can't change the values for the list element in the list.

You have:

if (dto.getId() == 1L) {
    dto.setId(3L); <--- Changing the contents
}

So you are chaning the value and the "equals" of the nodes.

For it to work correctly, you have to really MAP and not MODIFY:

if (dto.getId() == 1L) {
    return new Dto(3L); <---- Map to new object
} else {
    return new Dto(dto.getId()); <---- Map to new object
}
Bill Mair
  • 1,073
  • 6
  • 15