45

To my surprise, I ran into another snag like C++20 behaviour breaking existing code with equality operator?.

Consider a simple case-insensitive key type, to be used with, e.g., std::set or std::map:

// Represents case insensitive keys
struct CiKey : std::string {
    using std::string::string;
    using std::string::operator=;

    bool operator<(CiKey const& other) const {
        return boost::ilexicographical_compare(*this, other);
    }
};

Simple tests:

using KeySet   = std::set<CiKey>;
using Mapping  = std::pair<CiKey, int>; // Same with std::tuple
using Mappings = std::set<Mapping>;

int main()
{
    KeySet keys { "one", "two", "ONE", "three" };
    Mappings mappings {
        { "one", 1 }, { "two", 2 }, { "ONE", 1 }, { "three", 3 }
    };

    assert(keys.size() == 3);
    assert(mappings.size() == 3);
}
  • Using C++17, both asserts pass (Compiler Explorer).

  • Switching to C++20, the second assert fails (Compiler Explorer)

    output.s: ./example.cpp:28: int main(): Assertion `mappings.size() == 3' failed.


Obvious Workaround

An obvious work-around is to conditionally supply operator<=> in C++20 mode: Compile Explorer

#if defined(__cpp_lib_three_way_comparison)
    std::weak_ordering operator<=>(CiKey const& other) const {
        if (boost::ilexicographical_compare(*this, other)) {
            return std::weak_ordering::less;
        } else if (boost::ilexicographical_compare(other, *this)) {
            return std::weak_ordering::less;
        }
        return std::weak_ordering::equivalent;
    }
#endif

Question

It surprises me that I ran into another case of breaking changes - where C++20 changes behaviour of code without diagnostic.

On my reading of std::tuple::operator< it should have worked:

3-6) Compares lhs and rhs lexicographically by operator<, that is, compares the first elements, if they are equivalent, compares the second elements, if those are equivalent, compares the third elements, and so on. For non-empty tuples, (3) is equivalent to

if (std::get<0>(lhs) < std::get<0>(rhs)) return true;
if (std::get<0>(rhs) < std::get<0>(lhs)) return false;
if (std::get<1>(lhs) < std::get<1>(rhs)) return true;
if (std::get<1>(rhs) < std::get<1>(lhs)) return false;
...
return std::get<N - 1>(lhs) < std::get<N - 1>(rhs);

I understand that technically these don't apply since C++20, and it gets replaced by:

Compares lhs and rhs lexicographically by synthesized three-way comparison (see below), that is, compares the first elements, if they are equivalent, compares the second elements, if those are equivalent, compares the third elements, and so on

Together with

The <, <=, >, >=, and != operators are synthesized from operator<=> and operator== respectively. (since C++20)

The thing is,

  • my type doesn't define operator<=> nor operator==,

  • and as this answer points out providing operator< in addition would be fine and should be used when evaluating simple expressions like a < b.

  1. Is the behavior change in C++20 correct/on purpose?
  2. Should there be a diagnostic?
  3. Can we use other tools to spot silent breakage like this? It feels like scanning entire code-bases for usage of user-defined types in tuple/pair doesn't scale well.
  4. Are there other types, beside tuple/pair that could manifest similar changes?
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 8
    *"my type doesn't define operator<=> nor operator=="* - but `std::string` does, making it a candidate due to the drived-to-base conversion. I believe *all* standard library types that support comparison had their members overhauled. – StoryTeller - Unslander Monica Mar 05 '21 at 17:59
  • 8
    I guess non-virtual destructors are no longer the sole compelling reason to avoid inheriting from standard library containers :/ – StoryTeller - Unslander Monica Mar 05 '21 at 18:04
  • 1
    @StoryTeller-UnslanderMonica: "Never has been." https://quuxplusone.github.io/blog/2018/12/11/dont-inherit-from-std-types/ – Quuxplusone Mar 06 '21 at 05:42
  • Wouldn't the best way to implement case insensitiive strings be through a new type_traits<> ?? – Michaël Roy Mar 06 '21 at 16:43
  • 1
    @Quuxplusone nice write-up. Arguably, also pretty new effects due to CTAD (as well as tangent on the darned initializer_list/{} initialization conundrum), but the premise hasn't changed much indeed. You can't escape the tight coupling with inheritance, which means forfeit any future guarantees as the standard does change. – sehe Mar 06 '21 at 23:39
  • @MichaëlRoy: You mean `char_traits`, but, absolutely not. The _right_ way to implement a case-insensitive string is to just write `class CaseInsensitiveString` — there's no reason for it to be a specialization of `std::basic_string` at all. The second-best way would be to implement a trivial `class CaseInsensitiveChar` with appropriate `operator==` and then use `std::basic_string`; but the second-best way doesn't work with Unicode, so, you should forget it and just use the first best way. – Quuxplusone Mar 07 '21 at 18:31
  • @Quuxplusone ci strings are a sore in the std. One of their most annoying feature is they do nkot share the same storage space as regular strings... I think it would be much more effective to write a case-insensitive map instead. One that accepts a string_view as a parameter for find() would be nice to have to start with.. – Michaël Roy Mar 10 '21 at 14:17
  • @MichaëlRoy "ci strings" simply do not exist in the standard. Heterogenous lookup has been added AFAIR. You can already create a string_view with custom traits. My standard ci-map is generic and takes 3 lines of code (using boost's `ilexicographical_compare`). – sehe Mar 10 '21 at 14:20
  • I know they do not exist. I also know that if they did, I would not be the only one using them. – Michaël Roy Mar 10 '21 at 14:45

3 Answers3

48

The basic problem comes from the facts that your type is incoherent and the standard library didn't call you on it until C++20. That is, your type was always kind of broken, but things were narrowly enough defined that you could get away with it.

Your type is broken because its comparison operators make no sense. It advertises that it is fully comparable, with all of the available comparison operators defined. This happens because you publicly inherited from std::string, so your type inherits those operators by implicit conversion to the base class. But the behavior of this slate of comparisons is incorrect because you replaced only one of them with a comparison that doesn't work like the rest.

And since the behavior is inconsistent, what could happen is up for grabs once C++ actually cares about you being consistent.

A larger problem however is an inconsistency with how the standard treats operator<=>.

The C++ language is designed to give priority to explicitly defined comparison operators before employing synthesized operators. So your type inherited from std::string will use your operator< if you compare them directly.

C++ the library however sometimes tries to be clever.

Some types attempt to forward the operators provided by a given type, like optional<T>. It is designed to behave identically to T in its comparability, and it succeeds at this.

However, pair and tuple try to be a bit clever. In C++17, these types never actually forwarded comparison behavior; instead, it synthesized comparison behavior based on existing operator< and operator== definitions on the types.

So it's no surprise that their C++20 incarnations continue that fine tradition of synthesizing comparisons. Of course, since the language got in on that game, the C++20 versions decided that it was best to just follow their rules.

Except... it couldn't follow them exactly. There's no way to detect whether a < comparison is synthesized or user-provided. So there's no way to implement the language behavior in one of these types. However, you can detect the presence of three-way comparison behavior.

So they make an assumption: if your type is three-way comparable, then your type is relying on synthesized operators (if it isn't, it uses an improved form of the old method). Which is the right assumption; after all, since <=> is a new feature, old types can't possibly get one.

Unless of course an old type inherits from a new type that gained three-way comparability. And there's no way for a type to detect that either; it either is three-way comparable or it isn't.

Now fortunately, the synthesized three-way comparison operators of pair and tuple are perfectly capable of mimicking the C++17 behavior if your type doesn't offer three-way comparison functionality. So you can get back the old behavior by explicitly dis-inheriting the three-way comparison operator in C++20 by deleting the operator<=> overload.

Alternatively, you could use private inheritance and simply publicly using the specific APIs you wanted.

Is the behavior change in c++20 correct/on purpose?

That depends on what you mean by "on purpose".

Publicly inheriting from types like std::string has always been somewhat morally dubious. Not so much because of the slicing/destructor problem, but more because it is kind of a cheat. Inheriting such types directly opens you up to changes in the API that you didn't expect and may not be appropriate for your type.

The new comparison version of pair and tuple are doing their jobs and doing them as best as C++ can permit. It's just that your type inherited something it didn't want. If you had privately inherited from std::string and only using-exposed the functionality you wanted, your type would likely be fine.

Should there be a diagnostic?

This can't be diagnosed outside of some compiler-intrinsic.

Can we use other tools to spot silent breakage like this?

Search for case where you're publicly inheriting from standard library types.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Agreed. I'd say the change of behavior is on purpose because that's actually a documented change for 'tuple::operator<=>` and similar. I don't think the library has become more strict (because `std::set` uses exactly the same assumptions about the ordering), which is also why it still works with `std::set` directly. – sehe Mar 05 '21 at 18:18
  • W.r.t the diagnostics, the fact that no concept can check for consistency of the operators is .... only part of the story. It is perhaps equally true that the compiler can warn that "This code used to call `CiKey::operator<` before c++20, but now calls `std::string::operator<=>`. Mmm. I think there have been similar warnings added to c++20 compilers. – sehe Mar 05 '21 at 18:20
  • 1
    This is true, although @sehe would still run into the problem even if they implemented all six comparisons... Which is :sigh: – Barry Mar 05 '21 at 20:17
  • 1
    @Barry: Are you saying that `pair` gives priority to `operator<=>`, even if a user-provided operator would be called instead? – Nicol Bolas Mar 05 '21 at 20:58
  • 8
    @NicolBolas Yeah - if `<=>` is available, it uses `<=>`. Can't really differentiate that you also separately provided `<` in a way that is unrelated to that `<=>`. And if a type provides `<=>`, you _really_ want to use `<=>` to implement `<` (vs `<` twice). – Barry Mar 05 '21 at 21:03
  • 1
    @NicolBolas Perhaps moving the "This happens because" to the beginning of the answer could give it a more positive tone. As written, those first paragraphs give a misleading "hostile angry rant" look that might lead folks to skip it. – kfsone Aug 10 '21 at 17:51
12

Ah! @StoryTeller nailed it with their comment:

"my type doesn't define operator<=> nor operator==" - but std::string does, making it a candidate due to the d[e]rived-to-base conversion. I believe all standard library types that support comparison had their members overhauled.

Indeed, a much quicker work-around is:

#if defined(__cpp_lib_three_way_comparison)
    std::weak_ordering operator<=>(
        CiKey const&) const = delete;
#endif

Success! Compiler Explorer

Better Ideas

Better solution, as hinted by StoryTeller's second comment:

I guess non-virtual destructors are no longer the sole compelling reason to avoid inheriting from standard library containers :/

Would be to avoid inheritance here:

// represents case insensiive keys
struct CiKey {
    std::string _value;

    bool operator<(CiKey const& other) const {
        return boost::ilexicographical_compare(_value, other._value);
    }
};

Of course this requires (some) downstream changes to the using code, but it's conceptually purer and insulates against this type of "standard creep" in the future.

Compiler Explorer

#include <boost/algorithm/string.hpp>
#include <iostream>
#include <set>
#include <version>

// represents case insensiive keys
struct CiKey {
    std::string _value;

    bool operator<(CiKey const& other) const {
        return boost::ilexicographical_compare(_value, other._value);
    }
};

using KeySet   = std::set<CiKey>;
using Mapping  = std::tuple<CiKey, int>;
using Mappings = std::set<Mapping>;

int main()
{
    KeySet keys { { "one" }, { "two" }, { "ONE" }, { "three" } };
    Mappings mappings { { { "one" }, 1 }, { { "two" }, 2 }, { { "ONE" }, 1 },
        { { "three" }, 3 } };

    assert(keys.size() == 3);
    assert(mappings.size() == 3);
}

Remaining Questions

How can we diagnose problems like these. They're so subtle they will escape code review. The situation is exacerbated by there being 2 decades of standard C++ where this worked perfectly fine and predictably.

I guess as a sidenote, we can expect any "lifted" operators (thinking of std::variant/std::optional) to have similar pitfalls when used with user-defined types that inherit too much from standard library types.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 6
    It does not make sense inheriting from `std::string` when you only want the storage of the data, in which case it would make much more sense to simply have it as a private member. Some of the literature even claims that private inheritance should never be used, and having a private member is the preferred way. As one example, Google C++ style guide states ["If you want to do private inheritance, you should be including an instance of the base class as a member instead."](https://google.github.io/styleguide/cppguide.html#Inheritance) – Mysterious User Mar 05 '21 at 19:21
  • 1
    @MysteriousUser All understood. In this case it wasn't about storage. It was about inheriting the entire interface and substitutability with the base type. Hence why I mention that _"Of course this requires downstream changes to the using code but it's conceptually purer"_. I think the prevailing argument here is de-coupling. Not Incidentally also what brought this up in the first place. – sehe Mar 06 '21 at 00:09
  • 1
    In practice, wouldn't you just define your `std::set` with a suitable comparator template argument instead of using a custom key type? (I generally prefer using custom types as much as possible but in this toy example it really isn't required.) – Konrad Rudolph Mar 06 '21 at 13:12
  • @KonradRudolph I usually prefer it because I find it's making it easier to substitute containers/algorithm when `std::less<>` does the expected thing (eg replacing `set<>` with a `array` + `lower_bound`/`upper_bound`). In fact, I tried fixing by specializing that but `std::tuple`/`std::pair` are specified to use `operator<`, not `std::less<>`. – sehe Mar 06 '21 at 14:02
3

This is not really an answer on the different behaviors of std::string::operator=(), but I must point out that creating case insensitive strings should be done via customization template parameter Traits.

Example:

// definition of basic_string:
template<
    class CharT,
    class Traits = std::char_traits<CharT>,   // <- this is the customization point.
    class Allocator = std::allocator<CharT>
> class basic_string;

The example of case-insensitive string comes almost straight out from cppreference (https://en.cppreference.com/w/cpp/string/char_traits). I've added using directives for case-insensitive strings.

#include <cctype>
#include <cwctype>
#include <iostream>
#include <locale>
#include <string>
#include <version>

template <typename CharT> struct ci_traits : public std::char_traits<CharT>
{
    #ifdef __cpp_lib_constexpr_char_traits
    #define CICE constexpr
    #endif

private:
    using base = std::char_traits<CharT>;
    using int_type = typename base::int_type;

    static CICE CharT to_upper(CharT ch)
    {
        if constexpr (sizeof(CharT) == 1)
            return std::toupper(static_cast<unsigned char>(ch));
        else
            return std::toupper(CharT(ch & 0xFFFF), std::locale{});
    }

public:
    using base::to_int_type;
    using base::to_char_type;

    static CICE bool eq(CharT c1, CharT c2)
    {
        return to_upper(c1) == to_upper(c2);
    }
    static CICE bool lt(CharT c1, CharT c2)
    {
        return to_upper(c1) < to_upper(c2);
    }
    static CICE bool eq_int_type(const int_type& c1, const int_type& c2)
    {
        return to_upper(to_char_type(c1)) == to_upper(to_char_type(c2));
    }
    static CICE int compare(const CharT *s1, const CharT *s2, std::size_t n)
    {
        while (n-- != 0)
        {
            if (to_upper(*s1) < to_upper(*s2))
                return -1;
            if (to_upper(*s1) > to_upper(*s2))
                return 1;
            ++s1;
            ++s2;
        }
        return 0;
    }
    static CICE const CharT *find(const CharT *s, std::size_t n, CharT a)
    {
        auto const ua(to_upper(a));
        while (n-- != 0) {
            if (to_upper(*s) == ua)
                return s;
            s++;
        }
        return nullptr;
    }
    #undef CICE
};

using ci_string = std::basic_string<char, ci_traits<char>>;
using ci_wstring = std::basic_string<wchar_t, ci_traits<wchar_t>>;

// TODO consider constexpr support
template <typename CharT, typename Alloc>
inline std::basic_string<CharT, std::char_traits<CharT>, Alloc> string_cast(
    const std::basic_string<CharT, ci_traits<CharT>, Alloc> &src)
{
    return std::basic_string<CharT, std::char_traits<CharT>, Alloc>{
        src.begin(), src.end(), src.get_allocator()};
}

template <typename CharT, typename Alloc>
inline std::basic_string<CharT, ci_traits<CharT>, Alloc> ci_string_cast(
    const std::basic_string<CharT, std::char_traits<CharT>, Alloc> &src)
{
    return std::basic_string<CharT, ci_traits<CharT>>{src.begin(), src.end(),
                                                    src.get_allocator()};
}

int main(int argc, char**) {
    if (argc<=1)
    {
        std::cout << "char\n";
        ci_string hello = "hello";
        ci_string Hello = "Hello";

        // convert a ci_string to a std::string
        std::string x = string_cast(hello);

        // convert a std::string to a ci_string
        auto ci_hello = ci_string_cast(x);

        if (hello == Hello)
            std::cout << string_cast(hello) << " and " << string_cast(Hello)
                    << " are equal\n";

        if (hello == "HELLO")
            std::cout << string_cast(hello) << " and "
                    << "HELLO"
                    << " are equal\n";
    }
    else
    {
        std::cout << "wchar_t\n";
        ci_wstring hello = L"hello";
        ci_wstring Hello = L"Hello";

        // convert a ci_wstring to a std::wstring
        std::wstring x = string_cast(hello);

        // convert a std::wstring to a ci_wstring
        auto ci_hello = ci_string_cast(x);

        if (hello == Hello)
            std::wcout << string_cast(hello) << L" and " << string_cast(Hello) << L" are equal\n";

        if (hello == L"HELLO")
            std::wcout << string_cast(hello) << L" and " << L"HELLO" << L" are equal\n";
    }
}

You can play with it here: https://godbolt.org/z/5ec5sz

Raghav Sood
  • 81,899
  • 22
  • 187
  • 195
Michaël Roy
  • 6,338
  • 1
  • 15
  • 19
  • 1
    Mmmm. This seems so far beside the point I'm tempted to point out that this is not correct case-folding. If we're going to be "precise" about things, I'd point out you should use proper locale-aware collations with full UNICODE awareness as appropriate. This calls for a party with ICU or something :) Also, can't help but noticing the `` include, but then the sudden lack of interest in using it to create "safish" interfaces. Now I have the urge to do code review again. Bah. – sehe Mar 06 '21 at 21:32
  • Locale-aware collations are only appropriate if the sorted data will be accessed exclusively in one locale, which generally implies it will be neither persisted nor communicated. Even then, code to perform locale-aware comparisons should be avoided outside of libraries whose purpose is exclusively to perform such comparisons, since a programmer cannot hope to be aware of all possible locales in which a program might be used, including some which may not have existed when the code was written. – supercat Mar 06 '21 at 22:02
  • @100% with you on the use of string_view... I only spent enough time for an had oc example, in wat is more a big comment than a solution, unless one wants to elude the problem with operator=() altogether. – Michaël Roy Mar 06 '21 at 22:20
  • @MichaëlRoy Thanks. No, I want to "elude" problems with upgrading existing code-bases to C++20. So, the code constructs are a given. And could be distinct (I often encounter ADTs derived from `boost::variant<...>`. I think these libraries will get a veritable avalanche of issues [like this one](https://stackoverflow.com/questions/65648897/c20-behaviour-breaking-existing-code-with-equality-operator). – sehe Mar 06 '21 at 22:34
  • Regardless @supercat is right: this was not appropriate for the use-case, which was basically my whole point. And, I haven't checked whether the example on cppreference was also wrong, but at least I'd say there's bug in the casts after masking on invocation of `std::toupper`: when `CharT` is signed, it will get undefined behaviour (e.g. when `ch == -128`). Also, I suppose allocator support is par for the course. I'm not sure whether inheriting `comparison_category` is still accurate. I doubt it, actually. It is also unnerving to see `eq_int_type` and similar missing, and a lack of constexpr. – sehe Mar 06 '21 at 23:15
  • I'll update the answer code accordingly. _[I saw now that the bug with the masking didn't exist on cppreference.]_. And [done](https://stackoverflow.com/posts/66508809/revisions) see [godbolt](https://godbolt.org/z/5ec5sz) – sehe Mar 06 '21 at 23:15