distinct
uses equals
method
Did you read the documentation, the Javadoc for Stream#distinct
? Programming by documentation usually works better than programming by intuition.
The first sentence says:
Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.
Did you override the equals
method? An edit to your Question says you did indeed.
In your equals
method, you compare only the member field personId
, of type Long
. So why would you expect the list of pet names to be considered?
Similarly, if the three Tim
objects have different ages such as 28, 48, and 98, that too is irrelevant. Your code says to consider them to be the same as long as the personId
number is the same.
You told the JVM to examine only the id field. So if two Person
objects have the same 64-bit integer number in their personId
field, they are considered equal. If the two numbers differ, the two Person
objects are not equal.
As to your more general question about which of two or more equal objects encountered in the stream are kept as the result of distinct
: Again, read the documentation.
For ordered streams, the selection of distinct elements is stable (for duplicated elements, the element appearing first in the encounter order is preserved.) For unordered streams, no stability guarantees are made.
So:
- If you have an ordered stream, the first object wins. Any duplicate objects that follow are eliminated.
- If the stream is not ordered, then any of the objects may win. You should not depend on any particular one to win.
You do not disclose what kind of stream you are using. So we cannot know if it is ordered or not. So we can provide no further insight.
equals
& hashCode
must share same logic
The implementations of equals
and hashCode
should always use the same logic. The hashCode
method should always be overridden along with equals
to maintain the general contract between them, which is: equal objects must have equal hash codes.
If for equality you compare the personId
field, then your hash code should be based on the personId
field value.
You did this correctly in your code. But, you could do so more simply by using Objects.hash( this.id )
. See this alternate implementation of your code.
package work.basil.example.distinct;
import java.util.List;
import java.util.Objects;
public final class Person
{
private final Long id;
private final String name;
private final int age;
private final List < String > petNames;
public Person ( Long id , String name , int age , List < String > petNames )
{
this.id = id;
this.name = name;
this.age = age;
this.petNames = petNames;
}
public Long id ( ) { return id; }
public String name ( ) { return name; }
public int age ( ) { return age; }
public List < String > petNames ( ) { return petNames; }
@Override
public boolean equals ( final Object o )
{
if ( this == o ) { return true; }
if ( o == null || getClass() != o.getClass() ) { return false; }
Person person = ( Person ) o;
return id.equals( person.id );
}
@Override
public int hashCode ( )
{
return Objects.hash( this.id );
}
@Override
public String toString ( )
{
return "Person[" +
"id=" + this.id + ", " +
"name=" + this.name + ", " +
"age=" + this.age + ", " +
"petNames=" + this.petNames + ']';
}
}
The issue of equals
and hashCode
needing to share the same logic is covered in the documentation, in the Java literature, and on Stack Overflow extensively. Search to learn more. Start here.
Example
Here is an example app using that class above.
package work.basil.example.distinct;
import java.util.List;
public class App
{
public static void main ( String[] args )
{
List < Person > persons =
List.of(
new Person( 1L , "Tim" , 28 , List.of( "Brix" ) ) ,
new Person( 1L , "Tim" , 28 , List.of( "Brix" , "Cowboy" ) ) ,
new Person( 1L , "Tim" , 28 , List.of( "Brix" , "Cowboy" , "Fido" ) ) ,
new Person( 2L , "Jamie" , 19 , List.of() ) ,
new Person( 3L , "Fred" , 23 , List.of() )
);
List < Person > personsDistinct = persons.stream().distinct().toList();
System.out.println( "persons = " + persons );
System.out.println( "personsDistinct = " + personsDistinct );
}
}
When run:
persons = [Person[id=1, name=Tim, age=28, petNames=[Brix]], Person[id=1, name=Tim, age=28, petNames=[Brix, Cowboy]], Person[id=1, name=Tim, age=28, petNames=[Brix, Cowboy, Fido]], Person[id=2, name=Jamie, age=19, petNames=[]], Person[id=3, name=Fred, age=23, petNames=[]]]
personsDistinct = [Person[id=1, name=Tim, age=28, petNames=[Brix]], Person[id=2, name=Jamie, age=19, petNames=[]], Person[id=3, name=Fred, age=23, petNames=[]]]
record
Tip: If you want all the member fields to be considered automatically for equals
& hashCode
, and the main purpose of your class is to communicate data transparently and immutably, define your class as a record.
In a record, by default, the compiler implicitly creates the constructor, getters, equals
& hashCode
, and toString
.
Furthermore, you can define a record locally as well as define it as a nested class or as a separate class.
record Person( Long id , String name , int age , List < String > petNames ) { }