61

I'd like to compare STL strings that are allocated with different allocators, e.g. an ordinary std::string with a string using a custom STL allocator. Unfortunately, it seems that usual operator==() doesn't work in this case:

// Custom STL allocator to allocate char's for string class
typedef MyAllocator<char> MyCharAllocator;

// Define an instance of this allocator
MyCharAllocator myAlloc;

// An STL string with custom allocator
typedef std::basic_string
<
    char, 
    std::char_traits<char>, 
    MyCharAllocator
> 
CustomAllocString;

std::string s1("Hello");
CustomAllocString s2("Hello", myAlloc);

if (s1 == s2)  // <--- ERROR: doesn't compile
   ...

In particular, MSVC10 (VS2010 SP1) emits the following error message:

error C2678: binary '==' : no operator found which takes a left-hand operand of type 'std::string' (or there is no acceptable conversion)

So, a lower-level (less readable) code like this:

if (strcmp(s1.c_str(), s2.c_str()) == 0)
   ...

should be used.

(This is also particularly annoying in cases where e.g. there are std::vector's of differently-allocated strings, where the usual simple v[i] == w[j] syntax can't be used.)

This doesn't seem very good to me, since a custom allocator changes the way string memory is requested, but the interface of a string class (including comparison with operator==()) is independent from the particular way a string allocates its memory.

Is there something I am missing here? Is it possible to keep the C++ high-level interface and operator overloads in this case?

Mr.C64
  • 41,637
  • 14
  • 86
  • 162

2 Answers2

37

Use std::lexicographical_compare for less-than comparison:

bool const lt = std::lexicographical_compare(s1.begin(), s1.end(),
                                             s2.begin(), s2.end());

For equality comparison you can use std::equal:

bool const e = s1.length() == s2.length() &&
               std::equal(s1.begin(), s1.end(), s2.begin());

Alternatively, you can just fall back on strcmp (or actually memcmp, since that has the correct seman­tics; remember that the C++ string is more general than a C string), as you suggested, which can poten­tially employ some lower-level magic like comparing an entire machine word at a time (though the above algorithm may also be specialized thus). Measure and compare, I'd say. For short strings, the standard library algorithms are at least nicely self-descriptive.


Based on @Dietmar's idea below, you could wrap those functions into a templated overload:

#include <string>
#include <algorithm>

template <typename TChar,
          typename TTraits1, typename TAlloc1,
          typename TTraits2, typename TAlloc2>
bool operator==(std::basic_string<TChar, TTraits1, TAlloc1> const & s1,
                std::basic_string<TChar, TTraits2, TAlloc2> const & s2)
{
    return s1.length() == s2.length() &&
           std::equal(s1.begin(), s1.end(), s2.begin());
}

Usage example:

#include <ext/malloc_allocator.h>
int main()
{
    std::string a("hello");
    std::basic_string<char, std::char_traits<char>, __gnu_cxx::malloc_allocator<char>> b("hello");
    return a == b;
}

In fact, you could define such an overload for most standard containers. You could even template it on a template, but that would be extreme.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    With a single call to `lexicographical_compare` you can't determine whether two string are equal, isn't that so? – jrok Oct 09 '12 at 18:15
  • 4
    Another approach would be `s1.compare(s2.c_str())` – AFoglia Oct 09 '12 at 18:17
  • 2
    Seems like a reasonable workaround, but the question seems to be "Is it possible to keep the C++ high-level interface and operator overloads in this case?", not "how can I compare 2 such strings using other STL functions" – Rollie Oct 09 '12 at 18:33
  • @Rollie: Good point, I added a suggestion for wrapping this based on Dietmar's answer. – Kerrek SB Oct 09 '12 at 19:18
  • 1
    Why do you use `TChar1`/`TTraits1` and `TChar2`/`TTraits2` in your template? Shouldn't they be the same? – Mark Ransom Oct 09 '12 at 19:27
  • 3
    @MarkRansom: No, why? All I care about is that the char types are the same so that I can compare them. – Kerrek SB Oct 09 '12 at 19:28
  • 1
    @Kerrek: That kinda implies that `TChar1 == TChar2`, so why not make it a single parameter? – Xeo Oct 09 '12 at 19:33
  • 2
    @KerrekSB: I think the issue is that the traits say how to compare the strings. I'm not sure how relevant that is here. – Mooing Duck Oct 09 '12 at 19:34
  • @Xeo: Does it really? Is that a requirement of the standard? Do you have a reference for that? – Kerrek SB Oct 09 '12 at 19:38
  • @Kerrek: If you mean that `traits::char_type` has the be the same as the `charT` passed to the `std::basic_string`? I think that's rather obvious, isn't it? – Xeo Oct 09 '12 at 19:58
  • @Xeo: You tell me! We can all guess until we're blue in the face, but is this a standard requirement? – Kerrek SB Oct 09 '12 at 20:16
  • It's not spelled out directly (atleast I couldn't find it), but since almost all operations on the string are specified in terms of its traits, Passing a pointer to the internal character buffer to any trait function wouldn't work if `charT` was different from `traits::char_type`. Also, `operator[]` returns a `reference` which is specified as `value_type&` which in turn is specified as `traits::char_type` and (this may seem inconsistent, but it helps illustrate) `front` returns a `charT&`. – Xeo Oct 09 '12 at 20:21
  • @Xeo: Hm. Well, I agree that you'll almost definitely have `TChar == TTraits::char_type`, but there's no extra cost attached to my code, so I'd rather leave it the way it is. – Kerrek SB Oct 09 '12 at 20:32
  • 2
    @Kerrek: But there's readability to be gained. You can just throw out the `enable_if`, since it's implied from `charT` being used for both strings. – Xeo Oct 09 '12 at 20:41
  • @Mr.C64: You're overthinking this. The comparison operator really doesn't care about memory. We're just working with the *type system* here. I can change `TAlloc` into `TreacleTart` if that feels better. It's just a type. – Kerrek SB Oct 09 '12 at 21:34
  • 1
    @Xeo and Kerrek: From C++11 21.2@3: "Traits::char_type shall be the same as CharT." – interjay Oct 10 '12 at 15:32
  • @interjay: Thanks, I for the life of me, I just couldn't find it. :) – Xeo Oct 10 '12 at 17:41
20

The standard only define operators using homogenous string types, i.e., all the template arguments need to match. However, you can define a suitable equality operator in the namespace where the allocator is defined: argument dependent look-up will find it there. If you choose to implement your own assignment operator, it would look something like this:

bool operator== (std::string const& s0,
                 std::basic_string<char, std::char_traits<char>, MyCharAllocator> const& s1) {
    return s0.size() == s1.size() && std::equal(s0.begin(), s0.end(), s1.begin()).first;
}

(plus a few other overloads). Taking this to next level, it may even be reasonable to define versions the various relational operators in terms of the container requirements and not restricting the template arguments:

namespace my_alloc {
    template <typename T> class allocator { ... };
    template <typename T0, typename T1>
    bool operator== (T0 const& c0, T1 const& c1) {
        return c0.size() == c1.size() && std::equal(c0.begin(), c0.end(), c1.end);
    }
    ...
}

Obviously, the operators can be restricted to specific container types, differing only in their allocator template parameters.

With respect to why the standard doesn't define mixed type comparisons, the main reason behind not supporting mixed type comparison is probably that you actually don't want to mix allocators in your program in the first place! That is, if you need to use an allocator, you'd use an allocator type which encapsulates a dynamically polymorphic allocation policy and always use the resulting allocator type. The reasoning for this would be that otherwise you'd get either incompatible interface or you would need to make everything a template, i.e., you want to retain some level of vocabulary types being used. Of course, with using even just one additional allocator type, you'd have two vocabulary string types: the default instantiation and the instantiation for your special allocation.

That said, there is another potential reason to not support mixed type comparison: If operator==() really becomes a comparison between two values, as is the case if the allocators differ, it may give raise to a much broader definition of value equality: should std::vector<T>() == std::deque<T> be supported? If not, why would comparison between strings with different allocators be special? Of course, the allocator is a non-salient attribute of std::basic_string<C, T, A> which could be a good reason to ignore it. I'm not sure if mixed type comparison should be supported. It may be reasonable to support operators (this probably extends to other operators than operator==()) for container types differing only in their allocator type.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • 5
    "*If not, why would comparison between strings with different allocators be special?*" Because they're both *strings* that store the same type of character. They may come from different memory, but the *arrangement* of that memory is the same. They operate on that memory in the exact same way, and the meaning of it is identical. By all rights, the allocator shouldn't even be part of the container's signature; the user of a type should neither know nor care where that type's memory comes from unless they *have* to. – Nicol Bolas Oct 09 '12 at 18:35
  • @Nicol Bolas: Actually, the layout of `std::basic_string, MyCharAllocator>` can be **entirely** different! The moment a user-defined type is involved, the class template can be partially specialized with the only restriction being that it has to meet the standard specification. Maybe the difference can't be like between `std::vector` and `std::deque` (because the characters have to be in contiguous memory) but the arrangement of the memory can still be different. – Dietmar Kühl Oct 09 '12 at 18:44
  • 2
    "*can be entirely different*" That's my point: *it shouldn't be*. Just like the layout and implementation of `std::shared_ptr` doesn't change with allocators. Just like the layout and implementation of `std::function` doesn't change with allocators. This was a mistake by the C++ standards committee, one that they refuse to (or are unable to) fix. – Nicol Bolas Oct 09 '12 at 18:49
  • 3
    @Nicol: You're missing that both `shared_ptr` and `function` need to resort to *type erasure* for that (yes, I know they already need that anyways, however, a container does not), which kinda necessitates dynamic memory allocation for objects bigger than two pointers (stateful allocators, anyone?). How would you allocate that memory? With the allocator you gave to the container? Nah, that's not what it should be used for. With yet *another* allocator that's passed (kinda like what `shared_ptr`'s constructor can take)? – Xeo Oct 09 '12 at 18:55
  • Now, for stateless allocators, you might get away with a `char _alloc_buf;` where you constructor the allocator in, but if you resort to type erasure like this, you need a way to access the allocator's interface in some way. How would you do that? Most likely with a multiple pointer-to-function-template-instantiation, which would bloat the size of the class needlessly. – Xeo Oct 09 '12 at 18:58
  • 1
    `std::function` and `std::shared_ptr` both internally allocate some kind of object of some odd type. There is no extra overhead added by tying the allocation policy into this object. This isn't quite true for the various container types, although the preference would have been to remove allocators from the container interfaces. The proposal to remove allocators from the interface wasn't considered to be viable, however. – Dietmar Kühl Oct 09 '12 at 19:01
  • Btw, why do you use `std::mismatch` in the first implementation and `std::equal` in the second one? Also note the superfluous `return` in the second one. – Xeo Oct 09 '12 at 19:08
  • 1
    @DietmarKühl: in my particular case, the difference between `std::string` and `std::basic_string, MyCharAllocator>` is that `MyCharAllocator` is a _pool allocator_ that just increases a pointer inside a big chunk of memory when it allocates. The string interface is the same (what changes is the way I get the memory for the string); I should be able to use a simple `operator==()` to do string comparisons; `operator==()` shouldn't care _how_ I got the memory. – Mr.C64 Oct 09 '12 at 19:11
  • @Xeo: when I wrote the original I didn't think of `std::equal()` (I'm used to use `std::mismatch()`) and the `return` was a typo. I fixed both. Thanks! – Dietmar Kühl Oct 09 '12 at 19:12
  • @Mr.C64: Yes, I understand that and I agree with the equality operators being useful for mixed instantiations of the template. All I'm stating is why I think the standard doesn't support mixed type operations. The interface has to match, actually, because this is a requirement, even when specializing the standard template, although I guess you are just instantiating the template with a special allocator. I guess, my recommendation is to put the allocator into a separate namespace and define the required operators as described above. – Dietmar Kühl Oct 09 '12 at 19:19
  • I wondered about that partial specialization "problem" a bit, and I think it's a non-issue. The partial spec needs to adhere to the requirements set by the standard anyways, so you can just use traits-based comparision. – Xeo Oct 09 '12 at 19:29
  • @NicolBolas, do you think that C++17's `std::pmr::vector` http://en.cppreference.com/w/cpp/container/vector solves what you describe as the mistake of the standard? – alfC Jan 02 '17 at 02:19
  • @alfC: `string_view` does a much better job of dealing with the issue of allocator differences for strings than polymorphic allocators. In a more general sense for containers... maybe. The main issue is the conversion of someone else's container into a polymorphic-allocator one; such a conversion should be able to happen with a minimum of fuss and preferably without any allocations if possible. At the very least, moving a non-pmr container into a pmr-one shouldn't reallocate the object's storage. – Nicol Bolas Jan 02 '17 at 02:52