2

I want to apply a custom sorting comparator as follows:

 myString.sortWith{ case (c1,c2) => c1.compareTo(c2) <= 0}

This should sort the characters of the string by their codepoint values.

However it does not always work.. Consider a simple string cab312:

val str = "cab312"

str.sortWith{ case (c1,c2) => c1.compareTo(c2) <= 0}
res0: String = 123abc

That works fine. Consider a more complex string:

scala> val str = "TOADS POOLS hoppin good service & repair"
str: String = TOADS POOLS hoppin good service & repair

scala> str.sortWith{ case (c1,c2) => c1.compareTo(c2) <= 0}
java.lang.IllegalArgumentException: Comparison method violates its general contract!
  at java.util.TimSort.mergeHi(TimSort.java:899)
  at java.util.TimSort.mergeAt(TimSort.java:516)
  at java.util.TimSort.mergeCollapse(TimSort.java:441)
  at java.util.TimSort.sort(TimSort.java:245)
  at java.util.Arrays.sort(Arrays.java:1438)
  at scala.collection.SeqLike$class.sorted(SeqLike.scala:648)
  at scala.collection.immutable.StringOps.sorted(StringOps.scala:29)
  at scala.collection.SeqLike$class.sortWith(SeqLike.scala:601)
  at scala.collection.immutable.StringOps.sortWith(StringOps.scala:29)
  ... 32 elided

So .. we get a java.lang.IllegalArgumentException: Comparison method violates its general contract!

What is going on here? How can the same comparator both succeed and fail.. Is this a bug in the timsort - and in any case is there a workaround?

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • @RameshMaharjan Please read more carefully the other entry: for that one the comparator provided was not transitive. Whereas for the above case it is. – WestCoastProjects Jul 21 '18 at 02:44
  • 1
    @MarkRotteveel I don't see how this is a duplicate of the linked question. Java's comparator produces signed integers, whereas Scala's `sortWith` requires one boolean. It was exactly this boolean that was counter-intuitive. If the OP used comparator-like `Ordering` instead, the problem wouldn't have occurred at all. – Andrey Tyukin Jul 21 '18 at 11:57
  • @MarkRotteveel There is nowhere in that other question/answer mentioning the nuance of integer comparisons needing to be `<` and not `<=` . It is a "liberal" decision to close this question. – WestCoastProjects Jul 21 '18 at 16:15
  • Huh... Apparently, I can un-dupehammer it single-handedly... Nice to know. – Andrey Tyukin Jul 21 '18 at 16:28

1 Answers1

3

The documentation states unambiguously that sortWith expects a single function lt which returns true if and only if the first operand preceeds (is strictly Less Than) the second operand:

lt the comparison function which tests whether its first argument precedes its second argument in the desired ordering.

Your c1.compareTo(c2) <= 0 returns true for elements that are equal, and therefore violates the contract of lt. Changing <= to < eliminates the issue:

str.sortWith{ case (c1,c2) => c1.compareTo(c2) < 0}
//"      &ADLOOOPSSTacdeeeghiiinoooppprrrsv"
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
  • Nice catch! So the reason it worked with the first string is that there were no character duplicates. – WestCoastProjects Jul 21 '18 at 02:45
  • @javadba I think you should be able to brute-force an example where there are duplicates, but no exception is thrown. It could happen that the algorithm doesn't run into any invalid states even with duplicate characters and invalid `lt`. – Andrey Tyukin Jul 21 '18 at 02:47