7

Many C++ standard algorithms, like std::sort(), assume that the comparator comp is a strict weak ordering and it cannot be assumed that comp has any another (nice) properties. But many times comp does have more properties than just being a strict weak ordering. In particular, many times comp is a strict total order (so in particular, exactly one of the following is always true for all a and b: comp(a, b), comp(b, a), or a = b). For example, the usual operator<() on float-points, integers, and std::strings are all strict total orders.

By limiting itself to only assuming that comp is a strict weak ordering, does the C++ standard library limit itself to using less than optimal algorithms? Said differently, if the C++ standard algorithms assumed that comparators were strict total orderings instead of just strict weak orderings, then would some standard algorithms be faster than what's currently implemented?

Update: To be more exact about what "strict total ordering" means, let's suppose that the STL assumed that comp (operating on objects of type T) had all of the nice order-theoretic properties that operator<() on ints has. (So if you want, we may also assume that there is also an operator==() defined on objects of type T that works as you'd expect; this assumption is optional and you can make different assumptions if you'd like.) Could any STL algorithms be made faster?

More generally, if the STL made "nicer" assumptions about comp (i.e. assumed more than that comp is a mere strict weak ordering) then could any STL algorithms be made faster?

Matthew K.
  • 464
  • 5
  • 10
  • Do you assume that `a==b` is even a syntactically valid operation? – curiousguy Dec 27 '19 at 22:03
  • It was harder to do before C++20 but with SFINAE an implementation could dispatch to better algorithms based on type traits. – NathanOliver Dec 27 '19 at 22:11
  • @NathanOliver-ReinstateMonica What "trait" would help here? – curiousguy Dec 27 '19 at 22:18
  • @NathanOliver- Reinstate Monica Hypothetically, would it be possible to tell the STL "Hey, `comp` is a strict total order"? Since in general there's no way that a compiler could check that a given comparator is a strict total order (e.g. given a strict weak ordering `comp` on `std::string`s, to check that it's a strict total order the compiler would have to somehow deduce that for all `a` and `b`, exactly one of the following holds: `comp(a, b)`, `comp(b, a)`, or `a = b`, which in general it likely wouldn't be able to accomplish). – Matthew K. Dec 27 '19 at 22:18
  • 1
    @curiousguy cc Matt, before C++20 you'd have to make your own. In C++20 you can check if the return type is `std::strong_ordering` which is the new return type for strict total order comparisons. See https://en.cppreference.com/w/cpp/language/default_comparisons – NathanOliver Dec 27 '19 at 22:25
  • 8
    Note that `operator<` on floats is not a strict ordering due to `NaN`. – n314159 Dec 27 '19 at 22:33
  • @n314159 Thank you for that clarification. – Matthew K. Dec 27 '19 at 22:34
  • 1
    @n314159 You don't have to use these non values though. – curiousguy Dec 27 '19 at 22:35
  • 1
    @NathanOliver-ReinstateMonica "_std::strong_ordering which is the new return type for strict total order comparisons_" How many types, beside integers, can be considered as having "strict total order" `operator<`? – curiousguy Dec 27 '19 at 22:37
  • @MatthewK. "_or `a = b`_" Sorry but `=` is math not C++. Or philosophy. Either way, it's applicability here is questionable. **When are two objects "equal" in a non applicative PL?** – curiousguy Dec 27 '19 at 22:39
  • @curiousguy Lets say yes: `operator==()` is a valid operation and it works as you'd expect. e.g. `a == b` if and only if `!comp(a, b)` and `!comp(b, a)`. Let's say that our objects are plain old data `struct`s and `operator==()` is the default compiler-defined equality operator. (You don't have to assume this. You can make different assumptions if you want.) Could any STL algorithms be made faster by assuming this? (or by assuming whatever assumptions you prefer instead of my assumption?) – Matthew K. Dec 27 '19 at 22:44
  • @MatthewK. There is no clear set of requirements for `==` (beside obviously being an equivalence relation); informally the objects should have the "same value"; but value is absolutely not clearly defined except for simple types. Is float comparison an equality comparison? Some ppl say yes, others say no. – curiousguy Dec 27 '19 at 22:50
  • 1
    @curiousguy a good many of them. Pretty much any type that compares all of its members has a strict total order. Things like vector, string, Chrono::time_point all have a strict total order. – NathanOliver Dec 27 '19 at 23:01
  • @NathanOliver-ReinstateMonica There is no sequence container in the std where comparison is implemented as member-wise comparison. These members are ptrs! – curiousguy Dec 27 '19 at 23:17
  • @NathanOliver-ReinstateMonica I don't think `std::array` counts as a "sequence" container. We are talking formal semantics, that is maths here. The point is that the equality comparison of container only compare each the individual element stored inside it. **Distinct container objects can still be distinguished.** Just because `c1 == c2` doesn't mean I can't tell one from the other (w/o comparing their addresses). – curiousguy Dec 28 '19 at 00:17
  • Algorithm requires what they need. If your comparator is valid for your data, then it will works and be almost optimal in most cases. – Phil1970 Dec 28 '19 at 00:21
  • Do you have a motivation for this question other than academic interest? (Just academic interest is fine, by the way.) When I read the requirements for `Compare`, I tend to give the committee the benefit of the doubt. I assume they started with efficient algorithms, and deduced what was required of the comparison operation. Your question supposes either 1) that the `Compare` requirements came first, and that the algorithms were built around (and limited by) those requirements; or 2) someone has come up with a better algorithm in the interim. Evidence of the latter would be interesting. – JaMiT Dec 28 '19 at 15:24
  • 2
    I don't know if this is helpful, because I don't quite understand it, but Java requires `Comparator`s to have a total ordering, and if you don't do this, [sorting methods may throw an exception](https://stackoverflow.com/questions/24951257/when-does-timsort-complain-about-broken-comparator). This means sorting methods on floating-point types give special treatment to NaN, comparing it differently to normal comparison operators. I'm not sure why Java requires a total ordering though. – Boann Dec 28 '19 at 19:31
  • 2
    @JaMiT I've developed an algorithm that can be implemented more efficiently if the comparator is a total order. So I was wondering if there was a similar situation with any STL algorithms. – Matthew K. Dec 28 '19 at 20:08
  • @Boann How are Java requirements different from C++ requirements? – curiousguy Dec 28 '19 at 21:34
  • 1
    @MatthewK. When is your algo more efficient? In special cases or in the common cases? When there are many equal elements? – curiousguy Dec 28 '19 at 21:35

1 Answers1

1

For example, the usual operator<() on float-points, integers, and std::strings are all strict total orders.

So you are only talking about similarity of state, not true equality (whatever that is in a language with mutable state).

By limiting itself to only assuming that comp is a strict weak ordering, does the C++ standard library limit itself

No. The premise is wrong. The containers and algorithm library (the algorithms generating sorted sequences, those operating on sorted ranges, and the ordered associatives containers) is not limiting itself in any way, by definition: it says explicitly that an equivalence relation, which as far as I know isn't named (let's call it Sim) can be defined in term of the comparison Comp:

Sim (x,y) <=> !Comp(x,y) && !Comp(y,x)

So you have your strict order, just call Sim an "equality" and overload operator== to be defined as Sim.

So the only issue is the silliness of using a binary comparison function, which implies multiple scan of f.ex. strings to determine equality, and not having access to a ternary comparison (like strcmp). If you had access to Sim directly, you still would call Comp than Sim in case of equality, or alternatively Comp then another Comp.

Only when you a priori suspect that "equality" is the most probable result would you use Sim then Comp. This is absurd.

Three way is better for comparing sequences. Go three way.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • See also the [Compare requirement](https://en.cppreference.com/w/cpp/named_req/Compare) OP is linking: "Note: comp induces a strict total ordering on the equivalence classes determined by equiv." So the algorithms are working with a strict total ordering since it looks at all equivalent values as if they were equal. – n314159 Dec 27 '19 at 22:42
  • 1
    @n314159 "_looks at all equivalent values as if they were equal_" Which one does that? – curiousguy Dec 27 '19 at 22:45
  • [A lot of them](https://godbolt.org/z/aBXECk) pretty much all of those listed on the linked page regarding the compare requirement. – n314159 Dec 27 '19 at 22:56
  • @n314159 In these tests: if you claim `d1` and `d2` are treated "as if they were equal", do you assert that it's unspecified which one of these values are found in the set, which one `std::min(d1,d2)` returns, etc.? – curiousguy Dec 27 '19 at 23:10
  • No, it is specified (for `min` I know it, for `set` I assume it is). My point is, that these algorithms/containers cannot differntiate those two objects by their values, they only do that by there positions/times they are inserted. I.e. after the first `d1` or `d2` is in the `set` all further `find`s and `insert` do not (and actually cannot) differentiate if they had one or the other as their arguments. So for the observable behavior of those methods it is as if those two objects are equal. – n314159 Dec 27 '19 at 23:50
  • 1
    @n314159 My point is, if you insert `d1` then `d2`, the second does *not* replace the first. – curiousguy Dec 28 '19 at 00:19
  • That is also my point. This would also not happen if we insert `d2` another time. – n314159 Dec 28 '19 at 00:24
  • 1
    When someone use a container like a `std::set`, he should know **(1)** which members he want to compare and **(2)** if a second insert should be ignored or not (in which case a `std::map` or a `std::multiset` might be a better choice depending on the exact need). – Phil1970 Dec 28 '19 at 00:30