Why must std::sort compare function return false when arguments are equal?

Question

In std::sort you can supply a third argument which is the basis for how it sorts a list. If you want the first argument to come first, then you return true. If you want the second argument to come first you return false. I have come across the problem that my predicate function supposedly is an "invalid comparator", and I have narrowed it down to the fact that it does not fulfil the following requirement:

if arg1 == arg2, compare function MUST return false.

There have been some terms I have seen such as that std::sort requires "strict weak ordering". Apart from 2 places, all the other pages I get about these topics seem to be technical papers, and I can't understand it. What I can understand of it is that:

In weak ordering some elements "may be tied" with each other.

But to me this is also the meaning of a "partially ordered set", which is:

"there may be pairs of elements for which neither element precedes the other"

Furthermore, I can't understand what the "strict" implies in either of them.

Leaving aside my confusion about order theory terminology, my question is if in the compare function argument 1 and argument 2 are equal, and in which case I don't care which comes before the other (either one coming before would make me happy), why can't I return true to have argument 1 come first?

I was also going to ask how my program actually knows it's an invalid comparator but then I thought it probably just checks to see if arg1 and arg2 are equal when the compare function returns true.

Due note that there are no checks to verify that your comparator fulfills the requirements of strict weak ordering. — Rakete1111, Aug 29 '17 at 01:47
@Rakete1111: What is that supposed to mean? Implementations are free to perform such checks if they wish to do so. In fact, some real-life implementations actually do it. — AnT stands with Russia, Aug 29 '17 at 01:49
In Visual Studio for me it throws an exception saying "invalid comparator". I assumed either it explicitly checks, or something in the sorting algorithm goes wrong and throws it. — Zebrafish, Aug 29 '17 at 01:52
@AnT Really? What I mean is that in some implementations (like gcc/clang) they just assume that the requirement is fulfilled, and if it is not, then you get some weird order. — Rakete1111, Aug 29 '17 at 01:53
@Rakete1111: Not necessarily. If the requirement is not fulfilled, the behavior is undefined. In real-life it can manifest itself in many ways, not limited to just "some weird order". It can also easily lead to out-of-bounds access. Defending against that requires additional checks, which AFAIK GCC/Clang do not implement. So I'd assume that those "weird orders" you observed were generated by pure luck. In real-life it can also simply crash because of out-of-bounds access. — AnT stands with Russia, Aug 29 '17 at 02:03

score 17 · Answer 1 · answered Aug 29 '17 at 01:44

17

The compare function simply models a "less than" operator. Consider how the < operator works for primitive types like int:

int a = 1, b = 2;     a < b == true      a is less than b
int a = 2, b = 1;     a < b == false     a is not less than b, because a is greater than b
int a = 1, b = 1;     a < b == false     a is not less than b, because a equals b

Returning true means you want a to be ordered before b. So return false if that is not the case, either because you want b to be ordered before a, or because their order doesn't matter.

If you return true when the arguments are equal, then you are saying that you want a to come before b and you want b to come before a, which is a contradiction.

answered Aug 29 '17 at 01:44

Oktalist

14,336
3
43
63

I'm not sure why it means that I want that. Is that because in the next step the comparison will be on the same values except they'll be sent to the compare function with the arguments reversed? – Zebrafish Aug 29 '17 at 01:54
3

@Zebrafish: It's by definition. You seem to have things backwards: you don't get to implement this however you want and then somehow tell the algorithm how to use it - the algorithm says "give me a strict ordering predicate" and you implement that. And in a strict ordering, f(a, a) == false when a==a. – GManNickG Aug 29 '17 at 02:29

Terry · Answer 2 · 2021-06-21T08:30:11.700

1. From a code perspective

When elements count is greater than _S_threshold (in STL, default value is 16), std::sort() will use quicksort.

The following code is __unguarded_partition function in quicksort.

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Tp, typename _Compare>
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
             _RandomAccessIterator __last,
             const _Tp& __pivot, _Compare __comp) 
   {
      while (true)
      {
        while (__comp(*__first, __pivot))
          ++__first;
        --__last;
        while (__comp(__pivot, *__last))
          --__last;
        if (!(__first < __last))
          return __first;
        std::iter_swap(__first, __last);
        ++__first;
      }
    }

if arg1 == arg2, compare function return true, then __first iterator will keep moving to the right, and the program will go out of range and cause coredump.

while (__comp(*__first, __pivot))
    ++__first;

so, if arg1 == arg2, compare function MUST return false.

2. From a algorithm logic perspective

If comparison function uses operator<=, then it returns true for equal values. If test to see whether 10B is equivalent to 10A, it naturally uses the comparison function. In this example, that's operator<=. Tt checks to see whether this expression is true:

!(10A<= 10B)&&!(10B<= 10A) //test 10Aand 10B for equivalence

Well, 10A and 10B are both 10, so it's clearly true that 10A <= 10B. Equally clearly, 10B <= 10A. The above expression thus simplifies to

!(true)&&!(true)

and that simplifies to

false && false

which is simply false. That is, the test concludes that 10A and 10B are not equivalent, hence not the same. Furthermore, any comparison function where equal values return true will do the same thing. Equal values are, by definition, not equivalent! Isn't that cool?

You also can see << Effective STL>>, Item21: Always have comparison functions return false for equal values.

Benjamin Lindley · Answer 3 · 2017-08-29T02:46:56.140

The algorithm which std::sort uses is unspecified. By using a comparison function which does not meet the requirements set by the standard, you break the algorithm's assumptions, and cause undefined behavior.

Look what happens in the output of this noisy comparison function which uses <= (which is not a valid comparator) instead of < (which is valid).

http://coliru.stacked-crooked.com/a/34e4cef3280f3728

In the output, I am printing an incrementing variable (for reference to point out when the algorithm goes haywire), followed by the value of the first argument and its position in the vector, and then the second argument and its position in the vector. An example looks like this:

124: 2@12 <= 4@7

Which means this is the 124th invocation of the comparison function, and it is comparing the value 2 at index 12, with the value 4 at index 7.

Things go crazy starting at line 37

37: 0@0 <= 0@-1
38: 0@0 <= 144@-2
39: 0@0 <= 0@-3
40: 0@0 <= 192@-4

It is comparing values that I didn't even insert into the vector (144, 192, etc...) and at indexes outside the range of the vector (negative indexes, in this case).

Changing the comparsion to `<` I'm still seeing lines like `93: 0@35180308044405 <= 0@0`, so something seems fishy with the position calculation — Tor Arne, Aug 08 '19 at 12:30

score 3 · Answer 4 · answered Aug 29 '17 at 05:20

For an explanation to Benjamin Lindley's answer, consider the typical case where std::sort uses quicksort with Hoare type partition scheme. The left side is scanned for values < pivot using compare(value,pivot) to do the compare, while the right side is scanned for values > pivot using compare(pivot, value). Note that quicksort partition may rely on the fact that a left or right scan is stopped when it encounters a value == pivot and doesn't continue scanning past the pivot on that scan. If the user supplied compare function returns true on such a compare (true when value == pivot), the scan could continue beyond boundaries of the array or vector being sorted, which is apparently what happened in Benjamin Lindley's test case.

score 1 · Answer 5 · answered Aug 29 '17 at 02:22

Without going in the depth of the math, 2 variables can be compared using just '<' operator (or '>' if you wish). However '<' is used commonly in explanation of the 'strict weak ordering` and in implementation of sorters.

The idea basically is that in practical programming if a < b == false and b < a == false then a == b.

score 0 · Answer 6 · answered Jun 06 '23 at 11:52

TL;DR: Why must std::sort compare function return false when arguments are equal?

It must return false because there's no need to sort equal arguments.

Explanation

A 2-way comparison operator function checks if you want to return the first number before the second number. So it should return true if yes, or false if no.

std::sort and many other library functions use such an operator function.

eg.

std::sort(my_container.begin(), my_container.end(), [](const MY_CLASS* lhs, const MY_CLASS* rhs) -> bool
{
    return lhs->m_level < rhs->m_level;
});

However, when the comparison function is used as argument in such library functions, it is customary for such library functions to "act" based on a true response by the comparison function. Therefore, and in particular with std::sort, std::sort will do something (ie. perform reordering) when the argument is true.

So in general, when the comparison function is used as an argument of an std lib algorithm it should return true if the algorithm wants it to do something. And vice versa it should return false when it shouldn't do anything.

Why must std::sort compare function return false when arguments are equal?

6 Answers6

Linked

Related