-1

In Java let's say I have a class called Person. It has four properties:

  1. Long personId
  2. String name
  3. int age
  4. List<String> petNames

Let's say I have an array list variable of people called peopleList:

  1. personId: 1, name: "Tim", age: 28, petNames: [Brix]
  2. personId: 1, name: "Tim", age: 28, petNames: [Brix, Cowboy]
  3. personId: 1, name: "Tim", age: 28, petNames: [Brix, Cowboy, Fido]
  4. personId: 2, name: "Jamie", age: 19, petNames: []
  5. personId: 3, name: "Fred", age: 23, petNames: []

I override hashCode and equals of Person class to be this:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + personId.intValue();
    return result;
}

@Override
public boolean equals(Object obj) {
    System.out.println("Person equals method");
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    Person other = (Person) obj;
    if (personId != other.personId)
        return false;
    return true;
}

And then I use peopleList.stream().distinct().collect(Collectors.toList()); to remove duplicates. I was originally wrong, turns out it selects the first occurrence of Tim which is having only one pet and not the instance with all three pets. My original question though is: Which instance does distinct() choose? And that has been answered below.

Holger
  • 285,553
  • 42
  • 434
  • 765
Adam
  • 2,070
  • 1
  • 14
  • 18
  • 3
    From the _javadoc_ of method [equals](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Object.html#equals(java.lang.Object)) in class `java.lang.Object`: _equal objects must have equal hash codes_ – Abra Dec 13 '22 at 05:11
  • Thanks Abra for helping, can you expand on that a little? – Adam Dec 13 '22 at 05:16
  • 1
    [*What issues should be considered when overriding equals and hashCode in Java?*](https://stackoverflow.com/q/27581/642706) – Basil Bourque Dec 13 '22 at 05:18
  • 2
    **Caution:** `if (personId != other.personId)` will fail for most Number instances. Always compare objects using the `equals` method. If one or both of them might be null, use [Objects.equals](https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/Objects.html#equals(java.lang.Object,java.lang.Object)). – VGR Dec 13 '22 at 15:29

2 Answers2

2

distinct uses equals method

Did you read the documentation, the Javadoc for Stream#distinct? Programming by documentation usually works better than programming by intuition.

The first sentence says:

Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.

Did you override the equals method? An edit to your Question says you did indeed.

In your equals method, you compare only the member field personId, of type Long. So why would you expect the list of pet names to be considered?

Similarly, if the three Tim objects have different ages such as 28, 48, and 98, that too is irrelevant. Your code says to consider them to be the same as long as the personId number is the same.

You told the JVM to examine only the id field. So if two Person objects have the same 64-bit integer number in their personId field, they are considered equal. If the two numbers differ, the two Person objects are not equal.

As to your more general question about which of two or more equal objects encountered in the stream are kept as the result of distinct: Again, read the documentation.

For ordered streams, the selection of distinct elements is stable (for duplicated elements, the element appearing first in the encounter order is preserved.) For unordered streams, no stability guarantees are made.

So:

  • If you have an ordered stream, the first object wins. Any duplicate objects that follow are eliminated.
  • If the stream is not ordered, then any of the objects may win. You should not depend on any particular one to win.

You do not disclose what kind of stream you are using. So we cannot know if it is ordered or not. So we can provide no further insight.

equals & hashCode must share same logic

The implementations of equals and hashCode should always use the same logic. The hashCode method should always be overridden along with equals to maintain the general contract between them, which is: equal objects must have equal hash codes.

If for equality you compare the personId field, then your hash code should be based on the personId field value.

You did this correctly in your code. But, you could do so more simply by using Objects.hash( this.id ). See this alternate implementation of your code.

package work.basil.example.distinct;

import java.util.List;
import java.util.Objects;

public final class Person
{
    private final Long id;
    private final String name;
    private final int age;
    private final List < String > petNames;

    public Person ( Long id , String name , int age , List < String > petNames )
    {
        this.id = id;
        this.name = name;
        this.age = age;
        this.petNames = petNames;
    }

    public Long id ( ) { return id; }

    public String name ( ) { return name; }

    public int age ( ) { return age; }

    public List < String > petNames ( ) { return petNames; }

    @Override
    public boolean equals ( final Object o )
    {
        if ( this == o ) { return true; }
        if ( o == null || getClass() != o.getClass() ) { return false; }
        Person person = ( Person ) o;
        return id.equals( person.id );
    }

    @Override
    public int hashCode ( )
    {
        return Objects.hash( this.id );
    }

    @Override
    public String toString ( )
    {
        return "Person[" +
                "id=" + this.id + ", " +
                "name=" + this.name + ", " +
                "age=" + this.age + ", " +
                "petNames=" + this.petNames + ']';
    }
}

The issue of equals and hashCode needing to share the same logic is covered in the documentation, in the Java literature, and on Stack Overflow extensively. Search to learn more. Start here.

Example

Here is an example app using that class above.

package work.basil.example.distinct;

import java.util.List;

public class App
{
    public static void main ( String[] args )
    {
        List < Person > persons =
                List.of(
                        new Person( 1L , "Tim" , 28 , List.of( "Brix" ) ) ,
                        new Person( 1L , "Tim" , 28 , List.of( "Brix" , "Cowboy" ) ) ,
                        new Person( 1L , "Tim" , 28 , List.of( "Brix" , "Cowboy" , "Fido" ) ) ,
                        new Person( 2L , "Jamie" , 19 , List.of() ) ,
                        new Person( 3L , "Fred" , 23 , List.of() )
                );
        List < Person > personsDistinct = persons.stream().distinct().toList();

        System.out.println( "persons = " + persons );
        System.out.println( "personsDistinct = " + personsDistinct );
    }
}

When run:

persons = [Person[id=1, name=Tim, age=28, petNames=[Brix]], Person[id=1, name=Tim, age=28, petNames=[Brix, Cowboy]], Person[id=1, name=Tim, age=28, petNames=[Brix, Cowboy, Fido]], Person[id=2, name=Jamie, age=19, petNames=[]], Person[id=3, name=Fred, age=23, petNames=[]]]

personsDistinct = [Person[id=1, name=Tim, age=28, petNames=[Brix]], Person[id=2, name=Jamie, age=19, petNames=[]], Person[id=3, name=Fred, age=23, petNames=[]]]

record

Tip: If you want all the member fields to be considered automatically for equals & hashCode, and the main purpose of your class is to communicate data transparently and immutably, define your class as a record.

In a record, by default, the compiler implicitly creates the constructor, getters, equals & hashCode, and toString.

Furthermore, you can define a record locally as well as define it as a nested class or as a separate class.

record Person( Long id , String name , int age , List < String > petNames ) { }
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • I thought my hashCode is based on the personId field... can you tell me how it is not? More importantly, since it looks like it chooses the instance either by first if ordered or randomly if unordered, then how can I specify stream().distinct() to choose the instance with the most pets so that I can always get a consistent result? – Adam Dec 13 '22 at 05:46
  • 1
    @Adam Oops, I misread your `hashCode`. I'm working on some more example code, then I will edit that section. – Basil Bourque Dec 13 '22 at 05:47
  • Okay thanks Basil. And regarding this: "You do not disclose what kind of stream you are using." I am using java.util.stream, if that helps answer? To be more specific, I am using hibernate, and fetching a DTO projection with a To Many Association, where I use tuples, and use resultList.stream().map().distinct().collect(Collectors.toList()). Basically what this guy did at 2:05: https://www.youtube.com/watch?v=5oTH_Slettc – Adam Dec 13 '22 at 05:57
  • 1
    @Adam Be careful with names. [`java.util.stream`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/package-summary.html) is a package, not a class/interface. You meant [`java.util.stream.Stream`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html). – Basil Bourque Dec 13 '22 at 06:04
  • 1
    @Adam Furthermore, `Stream` is an interface, not a concrete class. Your code must use, directly or indirectly, a concrete class that implements that interface. It is up to you to know if that implementation in "ordered" or not, per the Javadoc discussed above, if you want to know which of the multiple equal objects will be chosen by the `distinct` method. – Basil Bourque Dec 13 '22 at 06:09
  • Thank you Basil. Can you point me in the direction of how to make the distinct method choose the instance with the most pets? – Adam Dec 13 '22 at 06:17
  • @Adam Your last Comment makes no sense. If the size of `petNames` matters, then the three objects with a `personId` of `1` are *not* equal. So calling `distinct` is no longer relevant or useful. You *could* work on ensuring the stream is ordered, with ordering first by `personId` and then by the size of `petNames` to put first the `Person` object with the longest `petNames`. But I suspect at this point we have ventured into an [XY problem](https://en.wikipedia.org/wiki/XY_problem). – Basil Bourque Dec 13 '22 at 06:24
  • I am querying my database and have a join, and i have more fields in the data tables than what I want for this particular server response, and so I am using data transfer objects. I am selecting only the 4 fields: person_id, name, age, and pets which is the join to my PETS table. As a result I have 5 rows returned, and my code, which I am leaving out of this post, is creating those three instances for Tim, and I want to only get the last instance that has all three pets added to his pets array. Curious why distinct was recommended for this here: https://www.youtube.com/watch?v=5oTH_Slettc – Adam Dec 13 '22 at 06:32
  • Very good coverage! Instead of saying, *The implementations of `equals` and `hashCode` should always use the same logic.*, I would say, *The `hashCode` method should always be overridden along with `equals` to maintain the general contract between them, which is: equal objects must have equal hash codes.*. – Arvind Kumar Avinash Dec 13 '22 at 08:33
  • 1
    @ArvindKumarAvinash Thank you. I copy-pasted your well-said sentence. – Basil Bourque Dec 13 '22 at 18:26
0

you could try to implement the 'Comparable' interface and use stream.sorted() to sort the list.size() in the first place

Code Example

public class Person implements Comparable<Person>

override compareTo method

@Override
public int compareTo(Person person) {
    if (this.getPersonId().equals(person.getPersonId())) {
        if (person.getPetNames().size() > this.getPetNames().size()) {
            return 1;
        } else{
            return -1;
        }
    }
    return 0;
}

finally

peopleList.stream().sorted().distinct().collect(Collectors.toList())

thanks

  • 1
    The bottom stream distinct layer uses HashSet for duplicate processing, and HashSet itself is weighted based on HashMap – ZhenHong Fan Dec 13 '22 at 06:53
  • Sorry for deleting my comment, I was trying to edit it. This is my edited comment: How do I know my stream is ordered? Sure the instances within are ordered thanks to the sorted method, but is that all this means or is there something else that specifies it such that each insertion of the stream is inserted with order? – Adam Dec 13 '22 at 06:54
  • Can you expand on that a little? Or give me a link or two to read on what you are saying? Thank you! Very interesting! – Adam Dec 13 '22 at 06:58
  • 1
    [trying-to-understand-distinct-for-streams-in-java8](https://stackoverflow.com/questions/34562359/trying-to-understand-distinct-for-streams-in-java8) Maybe this article will help you – ZhenHong Fan Dec 13 '22 at 07:20
  • 1
    This `compareTo` method is broken. Even if you fix the method, it’s a strongly discouraged idea, to have a natural order that contradicts equality. For example, since your `compareTo` method says that all objects with different ids are equal, the `sorted().distinct()` approach can remove arbitrary elements if an element with a different id exists. – Holger Dec 13 '22 at 09:20
  • Thank you very much for your comments. It seems that my method still needs to be discussed. Please forgive me for not understanding what you said `the sorted().distinct() approach can remove arbitrary elements if an element with a different id exists.` How does this work, or under what circumstances, can you be more specific? thank you – ZhenHong Fan Dec 14 '22 at 01:40