Java Principle of sort when using TreeSet

Question

when use iterator to traversal TreeSet, and of course the hashcode() and equals() is overrides, how does the treeset sort all elements as a certain order when the program is running? I mean, does the sort occur when the program is running at the "Iterator iterator = set.iterator();"?

Here is a example:

@Test
    public void test(){
        TreeSet set = new TreeSet();
        set.add(new Student2("Sam",97.8));
        set.add(new Student2("Joe",95.8));
        set.add(new Student2("Ben",99));
        set.add(new Student2("Chandler",93));
        set.add(new Student2("Ross",100));

        Iterator iterator = set.iterator();
        for (int i = 0; i < 3; i++) {
            iterator.hasNext();
            System.out.println(iterator.next());

        }

    }

public class Student2 implements Comparable {
    private String name;
    private double score;

    public Student2(String name, double score) {
        this.name = name;
        this.score = score;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public double getScore() {
        return score;
    }

    public void setScore(double score) {
        this.score = score;
    }

    @Override
    public String toString() {
        return "Student2{" +
                "name='" + name + '\'' +
                ", score=" + score +
                '}';
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        if (!super.equals(o)) return false;
        Student2 student2 = (Student2) o;
        return Double.compare(student2.score, score) == 0 && Objects.equals(name, student2.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(super.hashCode(), name, score);
    }

    @Override
    public int compareTo(Object o) {
        if(this != null){
            if(o instanceof Student2){
                Student2 s = (Student2) o;
                int i = (int) (s.score - this.score);
                return i;
            }else{
                throw new RuntimeException("Wrong type");
            }
        }
        return 0;
    }
}

Look up how red black trees work. That's how `TreeMap`s are implemented, and `TreeSet`s are based on `TreeMap`s. — Sweeper, Feb 01 '21 at 03:41
Score: **D**. Cause: [What is a raw type and why shouldn't we use it?](https://stackoverflow.com/q/2770321/5221149) — Andreas, Feb 01 '21 at 04:55
*"of course the hashcode() and equals() is overrides"* Doesn't matter, since `TreeSet` won't ever call those methods. — Andreas, Feb 01 '21 at 04:56
*"how does the treeset sort all elements as a certain order"* If you **read the documentation**, i.e. the javadoc of [`TreeSet`](https://docs.oracle.com/javase/8/docs/api/java/util/TreeSet.html), it says: "A `NavigableSet` implementation based on a `TreeMap`. The elements are ordered using their natural ordering, ..." --- Natural ordering is the order defined by `Comparable`, which you have. --- `TreeMap` documentation says "A Red-Black tree ...", which you can [look up](https://www.google.com/search?q=Red-Black+tree) to see how those work and keep the values sorted. — Andreas, Feb 01 '21 at 05:00
Be careful with your code formatting. Use back-ticks, not single-quote marks. I fixed it this time. — Basil Bourque, Feb 01 '21 at 05:03

Basil Bourque · Answer 1 · 2021-02-01T07:13:33.370

In Java 16 and later, we can shorten your Student class by making it a record. Not the point of this Answer, but using record makes the example code briefer. In a record, the compiler implicitly creates a constructor, getters, equals & hashCode, and toString.

public record Student ( String name , double score ) {}

The TreeSet class does not actually use the hashCode & equals methods you mention in your Question, as commented by Andreas. You must carefully read the TreeSet and Comparable Javadoc.

To work with TreeSet, you must either (a) make your class implement Comparable or (b) you must pass a Comparator implementation when instantiating the set. We will take the first approach, implementing Comparable on our Student record.

That Javadoc for TreeSet and Comparable explains that your compareTo method must be “consistent with equals”. What does that mean? If two objects being compared are named a and b, then “consistent with equals” means that where a.equals(b) returns true, so must a.compareTo(b) == 0 return true.

To quote the Javadoc for Comparable:

The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.

It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.

For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator, the second add operation returns false (and the size of the sorted set does not increase) because a and b are equivalent from the sorted set's perspective.

Virtually all Java core classes that implement Comparable have natural orderings that are consistent with equals. One exception is java.math.BigDecimal, …

The default implementation on a record for equals is to compare the equality of each contained object. So our compareTo should also account for all of the objects contained in our record. In this example, that means two member fields name & score.

In contrast, your code violated this rule of compareTo being consistent with equals because your compareTo looks only at score while your equals compares score and name.

We could write a compareTo method in a style to that seen in your question. In modern Java, it makes more sense to use the streams with method references seen in this next code example. If you are not comfortable with this style, use your old-school style — just be sure to include both score and name in your comparison.

package org.example;

import java.util.Comparator;

public record Student ( String name , double score ) implements Comparable < Student >
{
    @Override
    public int compareTo ( Student o )
    {
        return
                Comparator
                        .comparing( Student :: score )
                        .thenComparing( Student :: name )
                        .compare( this , o );
    }
}

The code above might look a tad inefficient as we instantiate a new Comparator on each call to compareTo. Firstly, for a few number of items in your collection, the performance cost is likely immaterial. Secondly, I would guess that the compiler or JIT will optimize that away — though I am not sure so perhaps someone might care to post a Comment. If concerned about performance, you could store the Comparator object in a static field.

And now we move on to the code for using that Student class.

Your example code:

        TreeSet set = new TreeSet();
        set.add(new Student2("Sam",97.8));
        set.add(new Student2("Joe",95.8));
        set.add(new Student2("Ben",99));
        set.add(new Student2("Chandler",93));
        set.add(new Student2("Ross",100));

        Iterator iterator = set.iterator();
        for (int i = 0; i < 3; i++) {
            iterator.hasNext();
            System.out.println(iterator.next());

        }

…has a few problems.

You hard-coded a limit of 3 in your for loop rather than soft-code for the size of the collection of Student objects. Perhaps you did indeed want only the bottom three objects. But I will assume that was a mistake, and you want all the objects to be reported.

You call iterator.hasNext();. This serves no purpose in your example. Just delete it.

You defined your set as a raw type rather than using generics. In modern Java your line TreeSet set = new TreeSet(); should be TreeSet < Student > set = new TreeSet<>() ;. This tells the compiler that we intend to store Student objects in this collection, and only Student objects. If we try to store an Elephant or Invoice, the compiler complains.

And I suggest having more specific naming in your variables. So students rather than set.

You define your collection of students as TreeSet. No need to do so. Your example code does not explicitly call methods that exist only on TreeSet. Your code only makes the assumption that the set be a NavigableSet. So use the more general interface rather than restricting yourself to the more specific concrete TreeSet.

NavigableSet < Student > students = new TreeSet <>();
students.add( new Student( "Sam" , 97.8 ) );
students.add( new Student( "Joe" , 95.8 ) );
students.add( new Student( "Ben" , 99 ) );
students.add( new Student( "Chandler" , 93 ) );
students.add( new Student( "Ross" , 100 ) );

Iterator iterator = students.iterator();
for ( int i = 0 ; i < students.size() ; i++ )
{
    System.out.println( iterator.next() );
}

By the way, for less code, I would be tempted to use Set.of syntax. The Set.of method returns an unmodifiable set. So we feed that to the constructor of our TreeSet.

NavigableSet < Student > students = new TreeSet <>(
        Set.of(
                new Student( "Sam" , 97.8 ) ,
                new Student( "Joe" , 95.8 ) ,
                new Student( "Ben" , 99 ) ,
                new Student( "Chandler" , 93 ) ,
                new Student( "Ross" , 100 )
        )
);

Iterator iterator = students.iterator();
for ( int i = 0 ; i < students.size() ; i++ )
{
    System.out.println( iterator.next() );
}

No need to explicitly instantiate the iterator. We could more simply use the for-each syntax in modern Java.

NavigableSet < Student > students = new TreeSet <>(
        Set.of(
                new Student( "Sam" , 97.8 ) ,
                new Student( "Joe" , 95.8 ) ,
                new Student( "Ben" , 99 ) ,
                new Student( "Chandler" , 93 ) ,
                new Student( "Ross" , 100 )
        )
);

for ( Student student : students )
{
    System.out.println( student );
}

When run.

Student[name=Chandler, score=93.0]
Student[name=Joe, score=95.8]
Student[name=Sam, score=97.8]
Student[name=Ben, score=99.0]
Student[name=Ross, score=100.0]

As for your question on:

how does the treeset sort all elements as a certain order when the program is running? I mean, does the sort occur when the program is running at the "Iterator iterator = set.iterator();"?

At first thought, does it matter? All we really care is that the TreeSet class deliver on the promises made by the NavigableSet contract. If we ask for the first one or last one, or ask to iterate, the results should be in the sorted order defined for our particular set’s objects Comparable or Comparator implementation. Generally, we should not care about the internal details of how TreeSet arranges its contents.

But if you really want an answer, read the add method on TreeSet. The Javadoc says it throws ClassCastException if the object cannot be compared with the elements currently in the set. So we know the comparison is being made when first adding an object to the collection. We can presume a structure is used internally to maintain that order.

If you really care about the details, look at the source code in OpenJDK project for the current implementation. But keep in mind that other implementations of Java are free to write a different implementation of that class.

Java Principle of sort when using TreeSet

1 Answers1

Linked