4

First of all I know that this issue was described in many other threads. However I was not able to find and answer to the question, why this error is not always thrown?

Let me describe what I mean. I have written some sample code to illustrate this :

public class Mushroom {

    public int size;

    public Mushroom(int size) {
        this.size = size;
    }

    @Override
    public boolean equals(Object obj) {
        //this is intentionally false - read in description
        return false;
    }
}

dsa

public class MushroomComparator implements Comparator<Mushroom> {

    @Override
    public int compare(Mushroom o1, Mushroom o2) {
        // here is the code which breaks the contract
         if (o1.size < o2.size){
             return 1;
         }else if(o1.size >o2.size){
             return -1;
         }
         return 1;
    }

}

and finally test for comparison:

public class ComparisonTest {
    public static void main(String[] args) {
//      System.setProperty("java.util.Arrays.useLegacyMergeSort", "true");
    List<Mushroom> forest = new ArrayList<>();

        for(int i =0; i<18; i++){
            Mushroom mushroom1 = new Mushroom(1);
            Mushroom mushroom2 = new Mushroom(3);
            Mushroom mushroom3 = new Mushroom(2);

            forest.add(mushroom1);
            forest.add(mushroom2);
            forest.add(mushroom3);

        }
        Collections.sort(forest, new MushroomComparator());
    }
}

During runtime we will receive this described issue

java.lang.IllegalArgumentException: Comparison method violates its general contract!

According to documentation of Collections.sort method :

(optional) if the implementation detects that the natural ordering of the list elements is found to violate the Comparable contract

So let's answer the question what is this contract in documentation of Comparable (in my example I use Comparator, but from the documentation it should fulfill the same requirements)

The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y

This rule I intentionally break to get error about which I'm writing about. This is one of the rule which implementation of compareTo method should follow. The rest of them are described in documentation, but generally speaking as I understand well documentation equivalence relation should be fulfilled

It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, and that the natural ordering is a total order on C

What is now what really bothers me is a result of change to iterations number in my test method. If you change number from 18 to 22 then ... exception will not be thrown. This exception is described as Optional so it does mean that sometimes this exception will be thrown, and sometimes no. I was not diving deeply into new implementation of sorting algorithm (TimSort) (change of sort agorithm since Java 7). I understand that it could be mabye some CPU consuming to check every set of data for breaking comparator contract, but what is an intetion to sometimes shows that and sometimes no. It could be really missleading.

Morover I can change the implementation of compare method to simply .. return 1. According to the documentation it should break the contract. But it isn't. In documentation stays also something about the contract with equals method but it not really needed

It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y))

And to check it in my example I've implemented equals method to return always false. With this I was sure that equals method doesn't impose on breaking comparator contract.

Now my question is : What is really contract of comparator (in context of breaking it) ? Why for some set of data this exception is thrown and for other not ? Maybe I'm missing something ? And last but not least - which rules exatcly need to be broken to throw this exception ?

Just to note, solution for this, could be turning off this new implementation of sort algorithm what is described here and commented in my sample code.

Community
  • 1
  • 1
Artur Skrzydło
  • 1,135
  • 18
  • 37
  • Was it a typo that your `compare()` returns 1 but not 0 when the `size` of the both `Mushroom` are the same ? (the last line of code). – leeyuiwah Jan 28 '17 at 13:30
  • 3
    @leeyuiwah: Fairly sure that's intentionally there to break the contract. – T.J. Crowder Jan 28 '17 at 13:31
  • Why does your `equals` always return false? – Chetan Kinger Jan 28 '17 at 13:31
  • @T.J. Crowder -- Oh I see. Thanks! Perhaps should have a comment in the code. – leeyuiwah Jan 28 '17 at 13:33
  • `equals` return always false to show that compare and equals contracts is not mandatory. It is written in the my question at the bottom – Artur Skrzydło Jan 28 '17 at 13:44
  • You say "Morover I can change the implementation of compare method to simply .. return 0. According to the documentation it should break the contract." -- why do you think that this breaks the contract? Always returning 0 just means that every element should sort equal to every other element. Not useful but it doesn't break the contact either. – Erwin Bolwidt Jan 28 '17 at 13:50
  • ok, zero was bad example, but the same result is with return 1; I've updated my question – Artur Skrzydło Jan 28 '17 at 13:56

2 Answers2

8

why this error is not always thrown

Because it's not the job of Collections.sort to validate your comparison method. Its job is simply to implement the sort. But the logic of doing so can reveal an invalid comparison method as a by-product of its logic in certain conditional branches. And in that case it makes sense to throw rather than attempting to continue to sort with an invalid comparison method.

What is really contract of comparator (in context of breaking it) ?

That if you break the contract, sort may not function correctly, or at all. (Not that it will validate your comparison method.)

Why for some set of data this exception is thrown and for other not ?

If the sort doesn't happen to follow a path that leads to a logical flaw that the sorting code can detect, it won't know to throw. But the TimSort being used does happen to go down a logical branch which reveals the invalid comparison method, so it does throw.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • of course, I agree that this method is not going to validate my compare method. But why exception is thrown so let's say unpredictable ? I understand that my comparison method shouldn't be based on check in some sort method but why this sort method sometimes detect breaking contract , sometimes not. In question 'What is really contract' i mean which rules violation exactly cause breaking the contract – Artur Skrzydło Jan 28 '17 at 14:06
  • @ArturSkrzydło: It's purely down to the implementation of the sort. I'd look at the source code if I were you, it's in `src.zip` in your JDK installation directory: `java/util/TimSort.java`. Somewhere along the way it figures out that it's gotten inconsistent answers from your comparison method, and so it throws the error. I don't really see that the exact details matter, but if they do to you, then follow the code through in your debugger. At the end of the day, the comparison method's breaking the contract, and so the results are going to be wrong or there's going to be an error. – T.J. Crowder Jan 28 '17 at 14:52
  • I think that @MikeNakis in his answer explained the way how TimSort algorithm works. It matters for me for the following reason. As a developer I've written some complex compare method. I haven't noticed that it breaks the contract (but i'm still not sure which rules of contract needs to be broken to cause this exception). QA team was testing application and every test passed. On production it was working but only one customer has so specifi data set that this exception occured. So should i always just in case catch IllegalArgumentException when sorting ? – Artur Skrzydło Jan 28 '17 at 15:18
  • 2
    @ArturSkrzydło: No, I'd suggest you write tests for your comparison functions. This is exactly what unit testing is designed to catch. – T.J. Crowder Jan 28 '17 at 15:27
  • i assume that you mean that i should test equivalence relation for compare method to be sure that this exception will never appear. that sounds reasonable. Thanks. – Artur Skrzydło Jan 28 '17 at 15:32
  • @ArturSkrzydło: Yeah -- well, all the possible outcomes, really. Ensure that the comparison really is returning the right value for a (very) broad range of pairs of compared objects. And if you happen to miss out one, and run into it in production as you have, then add it to your test suite, see that you're now getting a test failure, and then fix the bug and see that the test succeeds. :-) – T.J. Crowder Jan 28 '17 at 15:33
3

The exception will in fact always be thrown if Collections.sort() gets the chance to detect that your comparator misbehaves. This happens in certain rare cases where the sort() function knows that a certain item should fall within a certain range, because the comparator previously indicated so, and now it is invoking your comparator and it is being told that the item should fall outside of that range. But sort() is quite complex, and it tries to do as little work as possible, so quite often the conditions for detecting this misbehavior do not happen.

If sort() was to always throw the exception, then before starting to sort your list it would have to pre-emptively invoke your comparator N^2 (that's N squared) times so as to make sure that it always honors its contract for all permutations of pairs of entries in your list. That would be insanely inefficient.

The contract that the comparator must honor is laid out in the documentation: (https://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html)

  • The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y.

  • The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.

  • Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.

(where sgn(x) is the 'Math.signum()' function.)

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • ok, so I think that my assumption was correct, that this is because of performance reasons. Anyway it is really tought to discover, and some of the error may appear only after been released into production . What is also really importatnt for me is to know what rules exactly needs to be broken to raise this exception, because as I described in my question, when i return 1 (which break the rule) then exception won't appear. – Artur Skrzydło Jan 28 '17 at 15:28
  • @ArturSkrzydło I guess you have no option but to write a unit test for your comparator. Luckily, that should generally be easy. I amended my answer with a description of the contract. (But I guess you had already found that.) – Mike Nakis Jan 28 '17 at 18:08
  • yes I've found this, however in my example I've written that if I just return -1; instead of a logic which returns three different values, then exception doesn't appear. And returnin -1 in all cases obviously break the contract. Maybe this change also affect somehow algorithm, but this is exatcly why i desribe this exception missleading – Artur Skrzydło Jan 29 '17 at 08:37
  • Even calling the comparator `N²` times (to test all pairings) might still be insufficient to detect a faulty comparator, as that didn’t check whether the comparator consistently returns the same answer when being queried multiple times for the same pair of objects. Even if it does, the check only verified the consistency for the particular list, not for every possible object you could potentially pass to the comparator. – Holger Jun 14 '17 at 08:02