5

I've got this class:

class Column implements Comparable<Column> {
  private final float startX;
  private final float endX;

  public Column(float startX, float endX) {
    this.startX = startX;
    this.endX = endX;
  }

  public boolean isSameColumn(Column c) {
    return c.startX <= this.startX && this.startX < c.endX || this.startX <= c.startX && c.startX < this.endX;
  }

  @Override
  public int hashCode() {
    return Objects.hash(startX, endX);
  }

  @Override
  public boolean equals(Object obj) {
    if (obj == null) return false;
    if (getClass() != obj.getClass()) return false;
    final Column other = (Column) obj;
    return this.startX == other.startX && this.endX == other.endX;
  }

  @Override
  public int compareTo(Column o) {
    return isSameColumn(o) ? 0 : Float.compare(this.startX, o.startX);
  }
}

It seems to me that the compareTo method complies with the Comparator contract, even if it is not consistent with equals (on purpose) - that should not be a problem according to the javadoc:

It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y))

However, I sometimes get:

java.lang.IllegalArgumentException: Comparison method violates its general contract!

For example, the code below throws an exception (also on ideone - it's a bit long but most of it is test data).

Also note that running sorted() before distinct() in the stream solves the problem.

public static void main (String[] args) {
  String points = "54.199997, 88.399216, 135.2, 250.09616, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 120.79874, 135.2, 168.19669, 178.7, 207.49712, " +
          "135.2, 168.19669, 370.1, 391.69785, 108.2, 120.79874, 135.2, 189.49884, 135.2, 180.7966, 370.1, 391.69785, 108.2, 115.39928, 135.2, 215.59856, " +
          "135.2, 172.69742, 377.9, 391.6986, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 120.79874, 135.2, 212.59627, 135.2, 172.69742, 370.1, 391.69785, " +
          "108.2, 120.79874, 135.2, 212.59627, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 212.59627, 135.2, 172.69742, 374.0, 391.69824, " +
          "108.2, 115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 120.79874, 135.2, 189.49884, " +
          "135.2, 180.7966, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 120.79874, 135.2, 212.59627, 135.2, 172.69742, 374.0, 391.69824, " +
          "108.2, 120.79874, 135.2, 217.09616, 135.2, 170.59763, 377.9, 391.6986, 108.2, 120.79874, 135.2, 208.99689, 135.2, 163.39717, 374.0, 391.69824, " +
          "108.2, 120.79874, 135.2, 214.6982, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 189.49884, 135.2, 180.7966, 374.0, 391.69824, " +
          "135.2, 250.09616, 517.1, 544.6972, 108.2, 120.79874, 135.2, 198.79779, 135.2, 187.69778, 377.9, 391.6986, 108.2, 115.39928, 135.2, 215.59856, " +
          "135.2, 172.69742, 377.9, 391.6986, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 121.69865, 135.2, 191.89688, 135.2, 214.39604, 370.1, 391.69785, " +
          "108.2, 120.79874, 135.2, 217.39865, 135.2, 175.69711, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 120.79874, 135.2, 219.49155, " +
          "135.2, 163.39717, 374.0, 391.69824, 108.2, 120.79874, 135.2, 201.19598, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 218.59897, " +
          "135.2, 189.49573, 370.1, 391.69785, 108.2, 120.79874, 135.2, 218.59897, 135.2, 189.49573, 370.1, 391.69785, 108.2, 120.79874, 135.2, 209.5979, " +
          "135.2, 163.39717, 370.1, 391.69785, 108.2, 120.79874, 135.2, 174.19609, 135.2, 172.69742, 370.1, 391.69785, 108.2, 120.79874, 135.2, 174.19609, " +
          "135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 189.49884, 135.2, 180.7966, 374.0, 391.69824, 108.2, 120.79874, 135.2, 214.39862, " +
          "135.2, 178.9956, 374.0, 391.69824, 108.2, 120.79874, 135.2, 161.59735, 135.2, 196.39772, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, " +
          "108.2, 115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, 108.2, 115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, " +
          "108.2, 120.79874, 135.2, 195.1961, 135.2, 208.6958, 135.2, 178.0957, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 118.99892, " +
          "135.2, 144.1991, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 118.99892, 135.2, 193.09724, 374.0, 391.69824, 108.2, 120.79874, " +
          "135.2, 185.59799, 135.2, 189.19801, 377.9, 391.6986, 108.2, 120.79874, 135.2, 202.997, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, " +
          "135.2, 217.69742, 135.2, 163.39717, 374.0, 391.69824, 135.2, 250.09616, 517.1, 544.6972, 108.2, 120.79874, 135.2, 187.69786, 135.2, 172.69742, " +
          "370.1, 391.69785, 108.2, 115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, 108.2, 120.79874, 135.2, 208.99646, 135.2, 182.8964, " +
          "374.0, 391.69824, 108.2, 120.79874, 135.2, 214.6982, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 178.69823, 135.2, 163.39717, " +
          "374.0, 391.69824, 108.2, 120.79874, 135.2, 178.69844, 135.2, 172.69742, 370.1, 391.69785, 108.2, 120.79874, 135.2, 178.69844, 135.2, 172.69742, " +
          "377.9, 391.6986, 108.2, 120.79874, 135.2, 221.29475, 135.2, 163.39717, 374.0, 391.69824, 108.2, 120.79874, 135.2, 221.29475, 135.2, 163.39717, " +
          "370.1, 391.69785, 524.7, 553.7995, 54.199997, 88.399216, 108.2, 121.69865, 135.2, 191.89688, 135.2, 214.39604, 440.6, 468.19724, 108.2, 125.59826, " +
          "135.2, 179.2979, 187.7, 210.49771, 135.2, 208.69688, 370.1, 391.69785, 108.2, 120.79874, 135.2, 178.69844, 135.2, 172.69742, 374.0, 391.69824, " +
          "517.1, 544.6972, 54.199997, 88.399216, 108.2, 121.69865, 135.2, 191.89688, 135.2, 214.39604, 436.7, 468.19687, 108.2, 118.698944, 135.2, " +
          "196.99606, 135.2, 172.99727, 360.2, 391.69687, 108.2, 120.79874, 135.2, 201.19598, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, " +
          "209.5979, 135.2, 163.39717, 370.1, 391.69785, 108.2, 120.79874, 135.2, 189.49884, 135.2, 180.7966, 374.0, 391.69824, 108.2, 120.79874, 135.2, " +
          "208.69637, 135.2, 175.39758, 377.9, 391.6986, 108.2, 115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, 108.2, 115.39928, 135.2, " +
          "215.59856, 135.2, 172.69742, 377.9, 391.6986, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 121.69865, 135.2, 191.89688, 135.2, 214.39604, 436.7, " +
          "468.19687, 108.2, 118.698944, 135.2, 208.09802, 135.2, 162.79724, 364.1, 391.69724, 135.2, 250.09616, 517.1, 544.6972, 108.2, 120.79874, 135.2, " +
          "178.0957, 135.2, 163.39717, 374.0, 391.69824, 108.2, 120.79874, 135.2, 185.59691, 135.2, 163.39717, 370.1, 391.69785, 108.2, 115.39928, 135.2, " +
          "178.69844, 135.2, 172.69742, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 125.59826, 135.2, 179.2979, 187.7, 210.49771, 135.2, " +
          "213.79517, 370.1, 391.69785, 517.1, 544.6972, 54.199997, 88.399216, 108.2, 120.79874, 135.2, 214.6982, 135.2, 172.69742, 370.1, 391.69785, 108.2, " +
          "120.79874, 135.2, 212.59627, 135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 180.79544, 135.2, 167.29678, 374.0, 391.69824, 108.2, " +
          "120.79874, 135.2, 165.49696, 135.2, 167.29678, 374.0, 391.69824, 108.2, 120.79874, 135.2, 165.49696, 135.2, 167.29678, 374.0, 391.69824, 108.2, " +
          "115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, 108.2, 115.39928, 135.2, 215.59856, 135.2, 172.69742, 377.9, 391.6986, 108.2, " +
          "115.39928, 135.2, 190.69446, 135.2, 190.69894, 377.9, 391.6986, 108.2, 115.39928, 135.2, 165.49861, 135.2, 195.19905, 377.9, 391.6986, 517.1, " +
          "544.6972, 54.199997, 88.399216, 108.2, 118.99892, 135.2, 205.99895, 135.2, 195.79625, 374.0, 391.69824, 517.1, 544.6972, 54.199997, 88.399216, " +
          "108.2, 118.99892, 135.2, 201.49861, 374.0, 391.69824, 108.2, 118.698944, 135.2, 201.79861, 135.2, 183.19757, 364.1, 391.69724, 517.1, 544.6972, " +
          "54.199997, 87.499214, 108.2, 125.59826, 135.2, 182.2976, 190.7, 214.39763, 135.2, 215.897, 370.1, 391.69785, 517.1, 544.6972, 54.199997, " +
          "87.499214, 108.2, 118.99892, 135.2, 211.99535, 370.1, 391.69785, 108.2, 121.69865, 135.2, 191.89688, 135.2, 214.39604, 364.1, 391.69724, 108.2, " +
          "121.69865, 135.2, 191.89688, 135.2, 214.39604, 370.1, 391.69785, 108.2, 125.59826, 135.2, 222.19763, 135.2, 215.2975, 370.1, 391.69785, 108.2, " +
          "115.39928, 135.2, 199.99884, 135.2, 193.99792, 377.9, 391.6986, 108.2, 115.39928, 135.2, 175.69751, 135.2, 166.39688, 374.0, 391.69824, 135.2, " +
          "250.09616, 523.1, 544.6978, 108.2, 115.39928, 135.2, 198.797, 135.2, 171.1964, 374.0, 391.69824, 108.2, 115.39928, 135.2, 198.797, 135.2, " +
          "171.1964, 377.9, 391.6986, 108.2, 115.39928, 135.2, 198.797, 135.2, 171.1964, 377.9, 391.6986, 523.1, 544.6978, 54.199997, 87.499214, 108.2, " +
          "118.698944, 135.2, 174.19894, 135.2, 161.59735, 370.1, 391.69785, 108.2, 120.79874, 135.2, 178.69844, 135.2, 180.7966, 377.9, 391.6986, 108.2, " +
          "120.79874, 135.2, 189.49884, 135.2, 180.7966, 370.1, 391.69785, 108.2, 115.39928, 135.2, 160.99742, 135.2, 195.79776, 377.9, 391.6986, 523.1, " +
          "544.6978, 54.199997, 87.499214, 108.2, 120.79874, 135.2, 217.69742, 135.2, 163.39717, 370.1, 391.69785, 108.2, 120.79874, 135.2, 212.59627, " +
          "135.2, 172.69742, 374.0, 391.69824, 108.2, 120.79874, 135.2, 187.09769, 135.2, 163.39717, 377.9, 391.6986, 108.2, 115.39928, 135.2, 215.59856, " +
          "135.2, 172.69742, 377.9, 391.6986, 108.2, 120.79874, 135.2, 195.1961, 135.2, 209.29948, 135.2, 166.09691, 135.2, 173.89784, 135.2, 200.29836, " +
          "370.1, 391.69785, 108.2, 118.698944, 135.2, 181.39537, 135.2, 193.09871, 377.9, 391.6986, 527.0, 544.69824, 54.199997, 87.499214, 108.2, " +
          "125.59826, 135.2, 179.2979, 187.7, 211.39763, 135.2, 208.69688, 370.1, 391.69785, 108.2, 120.79874, 135.2, 212.59627, 135.2, 172.69742, 374.0, " +
          "391.69824, 524.7, 553.7995, 54.199997, 87.499214, 108.2, 120.79874, 135.2, 216.79735, 135.2, 163.39717, 374.0, 391.69824, 108.2, 121.69865, " +
          "135.2, 191.89688, 135.2, 214.39604, 440.6, 468.19724, 517.1, 544.6972";
  String[] pointss = points.split(", ");
  List<Column> columns = new ArrayList<> ();
  for (int i = 0; i < pointss.length; i += 2) {
    columns.add(new Column(Float.parseFloat(pointss[i]), Float.parseFloat(pointss[i+1])));
  }
  List<Column> columnsWithOverlap = columns.stream()
          .distinct()
          .sorted()
          .collect(toList());
  System.out.println(columnsWithOverlap);
}
assylias
  • 321,522
  • 82
  • 660
  • 783
  • 2
    Are you sure this has anything to do with equals and compare? http://stackoverflow.com/questions/8327514/comparison-method-violates-its-general-contract What step through your code and find the 2 points that cause it to throw the excption. – Falmarri Mar 04 '16 at 22:22
  • Is the actual exception "Comparison method violates its general contract!"? Please could you use that in the question :) – Andy Turner Mar 04 '16 at 22:24
  • Already answered: https://stackoverflow.com/questions/8327514/comparison-method-violates-its-general-contract – reegnz Mar 04 '16 at 22:30
  • @reegnz I think my comparator is transitive - can you show an example where it's not? – assylias Mar 04 '16 at 22:31
  • Your isSameColumn method might be faulty, I would start to look there. – reegnz Mar 04 '16 at 22:31
  • @assylias I will try to, just a sec. – reegnz Mar 04 '16 at 22:31
  • Interestingly if you call `Collections.sort(columns);` before streaming the columns no exception is thrown from `TimSort` – beresfordt Mar 04 '16 at 22:48
  • What is also interesting is if you only do the distinct operation, and then sort the resulting list using Collections.sort you still get the issue. – reegnz Mar 04 '16 at 22:56
  • 6
    I don't think your comparator is transitive, i.e. `columnA(1, 4), columnB(3, 7), columnC(6, 9)`. In this case, `columnA.compareTo(columnB) = 0` and `columnB.compareTo(columnC) = 0`, however `columnA.compareTo(columnC) < 0` – fps Mar 04 '16 at 22:57
  • @FedericoPeraltaSchaffner That's a good point indeed. – assylias Mar 04 '16 at 23:01
  • I am currently trying to get the smallest set of columns that can be used to reproduce the issue. – reegnz Mar 04 '16 at 23:24
  • This seems like a pretty big abuse of `compareTo`. The kind of ranges you're dealing with don't lend themselves to being ordered. What is the sort you're trying to achieve, and why do you need them sorted in this way? – jpmc26 Mar 04 '16 at 23:27
  • @jpmc26 The use case is as follows: it's a pdf document containing a table - I can read each word in the table and get its start and end X location (horizontally). The words in the columns are not always aligned so I need to "recreate" the columns by checking the coordinates that overlap.... If that makes sense. – assylias Mar 04 '16 at 23:43
  • 1
    @assylias If that's your use case, you should order columns by the middle point between startX and endX, this is (startX + endX) / 2. Then you could find groups of columns whose middle points lie inside the maximum number of columns. – fps Mar 05 '16 at 00:41

2 Answers2

8

I don't think your comparator is transitive, i.e. columnA(1, 4), columnB(3, 7), columnC(6, 9).

In this case, columnA.compareTo(columnB) = 0 and columnB.compareTo(columnC) = 0, however columnA.compareTo(columnC) < 0.

As for why it fails when sorted() appears before distinct(), who knows what happens when compareTo() is not transitive?

fps
  • 33,623
  • 8
  • 55
  • 110
  • 1
    Probably the comparison that is failing never comes up if the list is not distinct, because the two values that cause it to fail are never next to eachother in the original list in the example. – reegnz Mar 04 '16 at 23:12
  • 1
    @reegnz That might be the explanation, but what if the stream is parallel? – fps Mar 04 '16 at 23:13
  • Afaik, the parallel sort is sort-of deterministic in a divide and conquer manner using forkjoin, so that should not affect it. But I am probably not the right person to answer that yet. :P – reegnz Mar 04 '16 at 23:23
  • @reegnz: you’re right, the divide and conquer strategy is very close to how the merge sort variant (aka TimSort) works sequentially anyway. In either case, the data is subdivided, processed and merged afterwards. But there might be subtle differences regarding where the split is made, so with broken ordering, the result is non-deterministic in both cases. Adding or removing one element can change the entire outcome. – Holger Mar 05 '16 at 08:35
1

Ok, I think I might have found the issue, but not exactly how to solve it. The issue is that the comparator is not transitive. Others have also noted this, and I did also in the comments of the question.

If you swap the values of the Float.compare from this:

return isSameColumn(o) ? 0 : Float.compare(this.startX, o.startX);

to this:

return isSameColumn(o) ? 0 : Float.compare(o.startX, this.startX);

the sort works without problems, but this gives a reverse order.

Now the reason is still not entirely clear for me, but I hope this gives you the nudge in the right direction.

reegnz
  • 837
  • 9
  • 16
  • What is also strange to me is that reversing the sort order brings the issue back sort(Collections.reverseOrder()) – reegnz Mar 05 '16 at 00:34
  • 2
    Also considering the transitivity, it is too easy to bring an example that shows that the comparator is not transitive. see the answer of Federico. – reegnz Mar 05 '16 at 00:36