0

I have a list of objects of type Person and I want to get rid of elements that have the same name, using streams. I have found on the internet a suggestion to use a Wrapper class and my code looks like this so far:

List<Person> people = Arrays.asList(new Person("Kowalski"),
                                    new Person("Nowak"),
                                    new Person("Big"),
                                    new Person("Kowalski"));

List<Person> distPeople = people.stream()
        .map(Wrapper::new)
        .distinct()
        .map(Wrapper::unwrap)
        .collect(Collectors.toList());

In documentation it is said that distinct()

Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.

Implementation of Wrapper that doesn't work (I get the same stream with two Kowalski):

public class Wrapper
{
    private final Person person;

    Wrapper(Person p)
    {
        person = p;
    }

    public Person unwrap()
    {
        return person;
    }

    public boolean equals(Object other)
    {
        if(other instanceof Wrapper)
            return ((Wrapper) other).person.getName().equals(person.getName());
        else
            return false;
    }
}

Implementation of Wrapper class works after adding this:

@Override
public int hashCode()
{
    return person.getName().hashCode();
}

Can someone explain why after overriding hashCode() in the Wrapper class distinct() works?

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
hdw3
  • 871
  • 10
  • 28
  • 5
    Because `distinct()` likely uses a `HashSet` for best performance. You should *always* implement [`hashCode()`](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--) when you implement [`equals()`](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#equals-java.lang.Object-). – Andreas Jun 26 '18 at 18:29
  • 4
    Irrespective of whether you are doing this with your objects, `equals` and `hashCode` should always be overridden together so as to yield consistent results. Otherwise you are violating the contract of `Object`. – Andy Turner Jun 26 '18 at 18:41

3 Answers3

2

From equals Java doc

It is generally necessary to override the hashCode method whenever equals method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Please read the details about contract here

Thiru
  • 2,541
  • 4
  • 25
  • 39
2

The answer lies in the class DistinctOps. The method makeRef is used to return an instance of ReferencePipeline containing distinct elements. This method makes use of LinkedHashSet for performing a reduce operation in order to get distinct elements. Note that LinkedHashSet extends from HashSet which uses HashMap for storing elements. Now inorder for a HashMap to work properly, you should provide the implementation for hashCode() which follows the correct contract between hashCode() and equals() and therfore, it is required that you provide an implementation to hasCode() so that Stream#distinct() works properly.

Prashant
  • 4,775
  • 3
  • 28
  • 47
2

The distinct() operation uses a HashSet internally to check whether it already processed a certain element. The HashSet in turn relies on the hashCode() method of its elements to sort them into buckets.

If you don't override the hashCode() method, it falls back to its default, returning the object's identity, which usually differs between two objects even though they are the same according to equal(). Thus the HashSet puts them into different buckets and can no longer determine that they're the 'same' object.

Floern
  • 33,559
  • 24
  • 104
  • 119