20

The problem

Suppose we implement a string class which represents, uhm, strings. We then want to add an operator+ which concatenates two strings, and decide to implement that via expression templates to avoid multiple allocations when doing str1 + str2 + ... + strN.

The operator will look like this:

stringbuilder<string, string> operator+(const string &a, const string &b)

stringbuilder is a template class, which in turn overloads operator+ and has an implicit string conversion operator. Pretty much the standard textbook exercise:

template<class T, class U> class stringbuilder;

template<> class stringbuilder<string, string> {
    stringbuilder(const string &a, const string &b) : a(a), b(b) {};
    const string &a;
    const string &b;
    operator string() const;
    // ...
}

// recursive case similar,
// building a stringbuilder<stringbuilder<...>, string>

The above implementation works perfectly as long as someone does

string result = str1 + str2 + ... + strN;

However, it has a subtle bug. Assigning the result to a variable of the right type will make that variable hold references to all the strings that compose the expression. That means, for instance, that changing one of the strings will change the result:

void print(string);
string str1 = "foo";
string str2 = "bar";
right_type result = str1 + str2;
str1 = "fie";
print(result); 

This will print fiebar, because of the str1 reference stored inside the expression template. It gets worse:

string f();
right_type result = str1 + f();
print(result); // kaboom

Now the expression template will contain a reference to a destroyed value, crashing your program straight away.

Now what's that right_type? It is of course stringbuilder<stringbuilder<...>, string>, i.e. the type the expression template magic is generating for us.

Now why would one use a hidden type like that? In fact, one doesn't use it explicitely -- but C++11's auto does!

auto result = str1 + str2 + ... + strN; // guess what's going on here?

The question

The bottom line is: it seems that this way of implementing expression templates (by storing cheap references instead of copying values or using shared pointers) gets broken as soon as one tries to store the expression template itself.

Therefore, I'd pretty much like a way of detecting if I'm building a rvalue or a lvalue, and provide different implementations of the expression template depending on whether a rvalue is built (keep references) or a lvalue is built (make copies).

Is there an estabilished design pattern to handle this situation?

The only things I was able to figure out during my research were that

  1. One can overload member functions depending on this being an lvalue or rvalue, i.e.

    class C {
        void f() &; 
        void f() &&; // called on temporaries
    }
    

    however, it seems I can't do that on constructors as well.

  2. In C++ one cannot really do ``type overloads'', i.e. offer multiple implementations of the same type, depending on how the type is going to be used (instances created as lvalues or rvalues).

Cassio Neri
  • 19,583
  • 7
  • 46
  • 68
peppe
  • 21,934
  • 4
  • 55
  • 70
  • Rather than detecting rvalue/lvalue ref in the constructor why don't you detect this by overloading operator+ with rvalue/lvalue ref combinations. – a.lasram Jul 22 '13 at 20:30
  • @a.lasram because that does not solve the problem. – Yakk - Adam Nevraumont Jul 22 '13 at 20:32
  • Don't let them declare variables of the *right_type*. – n. m. could be an AI Jul 22 '13 at 20:32
  • @n.m. you also have to block them from declaring variables of `right_type&` and `right_type&&` and `right_type const&`. I can block `right_type`, how do we block the others? Secret types? Will that block `auto`? – Yakk - Adam Nevraumont Jul 22 '13 at 20:33
  • @peppe the rvalue case is easy -- when creating expression templates containing rvalues, you could create a local copy and move into it. Note that the proper `stringbuilder` signature does perfect forwarding, not take-by-`const&`. So your `stringbuilder` is distinct from `stringbuilder` -- one has two temporaries, the other has a pair of `const&`s. Even if you don't store copies (and instead store `string&&`) you should do this, because this lets you know if you can move out of those temporaries or not. – Yakk - Adam Nevraumont Jul 22 '13 at 20:35
  • Doesn't `operator string() &&;` cut it? Sure, someone could use `move`, but there's not anything more we can do. This problem has been unsolved for a very long time. – Mooing Duck Jul 22 '13 at 20:42
  • @Peppe: This is a known problem with `auto` that nobody who I asked (including real experts) could give an answer to. :-( – Cassio Neri Jul 22 '13 at 20:43
  • @Yakk `right_type&` and `right_type const&` are doable if you make everything in sight private, but `right_type&&` is indeed a problem. – n. m. could be an AI Jul 22 '13 at 20:49
  • 1
    C++chatroom suggested adding `operator auto` to the language :D – Mooing Duck Jul 22 '13 at 20:52
  • @Yakk if we disallow all this then we won't be able to pass the objects of `right_type` to functions, which kind of defeats the idea. Nope, won't work. – n. m. could be an AI Jul 22 '13 at 20:52
  • @MooingDuck: I'd also add `operator X&()` and `operator X&&()` ;) – n. m. could be an AI Jul 22 '13 at 20:53
  • @n.m.: don't those already work? I'm pretty sure the first does at least. and `operator X()` handles the second pretty closely. – Mooing Duck Jul 22 '13 at 20:58
  • @MooingDuck: you can declare them right now but they don't work as members of X. – n. m. could be an AI Jul 22 '13 at 21:15
  • You could specialize `std::move` to not compile for your `right_type`. Then, only allow/use rvalue member functions in your `right_type` (e.g. conversion to `string` via `operator string() const&&`). This should disallow any use of lvalue `right_type` objects and `right_type&&`. – dyp Jul 22 '13 at 21:28
  • @DyP except `std::forward(instance)` (which you could also block), or if someone reimplements `std::forward` or `std::move` from scratch for whatever insane reason. – Yakk - Adam Nevraumont Jul 22 '13 at 21:47

5 Answers5

13

I started this in a comment but it was a bit big for that. Then, let's make it an answer (even though it doens't really answer your question).

This is a known issue with auto. For instance, it has been discussed by Herb Sutter here and in more details by Motti Lanzkron here.

As they say, there were discussions in the committee to add operator auto to C++ to tackle this problem. The idea would be instead of (or in addition to) providing

operator string() const;

as you mentioned, one would provide

string operator auto() const;

to be used in type deduction contexts. In this case,

auto result = str1 + str2 + ... + strN;

would not deduce the type of result to be the "right type" but rather the type string because that's what operator auto() returns.

AFAICT this is not going to happen in C++14. C++17 pehaps...

Cassio Neri
  • 19,583
  • 7
  • 46
  • 68
  • Thanks, the links are indeed useful. However, if `operator auto` will catch the 99% of the usages, by detecting if one is using a lvalue or a rvalue will catch the 100% of the usages (including some crazy doing `stringbuilder,...> foo = a + b;`. That's why I put my question this way, and was looking for a design pattern, although at this point I think C++ is never going to provide the necessary language features. – peppe Jul 22 '13 at 21:16
  • 6
    @peppe if someone makes an explicit copy of your `stringbuilder` type, they should be allowed to shoot themselves in the foot, no? :) – Yakk - Adam Nevraumont Jul 22 '13 at 21:48
  • +1 for some way of forcing auto type deduction, although I'd prefer something like `using auto = string;` instead of `string operator auto();` because this would avoid code duplication (the implementation would be the same for `string operator auto();` and `operator string();`) and the return type in `string operator auto();` looks out of place when compared to the other conversion operators. – mitchnull Jul 23 '13 at 08:11
  • @mitchnull: I like your idea and I agree with what you said. – Cassio Neri Jul 23 '13 at 08:18
  • I'm accepting this one -- in the end it looks like what I'm asking for is impossible in C++ (C++11 and any foreseeable future revision). – peppe Jul 28 '13 at 17:02
2

Elaborating on a comment I made to the OP; example:

This only tackles the problem of assigning to either an object or binding to a reference and afterwards converting to a destination type. It is not a comprehensive fix the the problem (also see Yakk's response to my comment), but it prevents the scenario presented in the OP and makes it generally harder to write this kind of error-prone code.

Edit: It might not be possible to expand this approach for class templates (more specifically, the specialization of std::move). Macro'ing could work for this specific problem, but is obviously ugly. Overloading std::move would rely on UB.

#include <utility>
#include <cassert>

// your stringbuilder class
struct wup
{
    // only use member functions with rvalue-ref-qualifier
    // this way, no lvalues of this class can be used
    operator int() &&
    {
        return 42;
    }
};

// specialize `std::move` to "prevent" from converting lvalues to rvalue refs
// (make it much harder and more explicit)
namespace std
{
    template<> wup&& move(wup&) noexcept
    {
        assert(false && "Do not use `auto` with this expression!");
    }
    // alternatively: no function body -> linker error
}

int main()
{
    auto obj = wup{};
    auto& lref = obj;
    auto const& clref = wup{};
    auto&& rref = wup{};

    // fail because of conversion operator
      int iObj = obj;
      int iLref = lref;
      int iClref = clref;
      int iRref = rref;
      int iClref_mv = std::move(clref);

    // assert because of move specialization
      int iObj_mv = std::move(obj);
      int iLref_mv = std::move(lref);
      int iRref_mv = std::move(rref);

    // works
    int i = wup{};
}
Community
  • 1
  • 1
dyp
  • 38,334
  • 13
  • 112
  • 177
  • 1
    If `wup` is a `template` (as most expression template code is), then `move(wup)` will be a `template`, so you can stuff a `static_assert(false,"not allowed to move")` in there, and it will fail to compile when you try to call it... And do the same with `std::forward` maybe? – Yakk - Adam Nevraumont Jul 23 '13 at 00:03
  • @Yakk You raised an interesting point: There's no partial specialization for function templates, therefore this cannot work with templates at all!? – dyp Jul 23 '13 at 00:04
  • purely by accident. I forget the rules for injecting things into the `std` namespace, specialization only, no overrides? – Yakk - Adam Nevraumont Jul 23 '13 at 00:10
  • @Yakk Yes. I don't know how to solve this for templates; even defining `std::move` as a friend function wouldn't work. – dyp Jul 23 '13 at 00:12
0

Just a wild idea (haven't tried it):

template<class T, class U>
class stringbuilder
{
  stringbuilder(stringbuilder const &) = delete;
}

wouldn't force compilation error?

Tomek
  • 4,554
  • 1
  • 19
  • 19
  • I don't think this fixes the cases `auto const& r = ...` and `auto&& r = ...` – dyp Jul 22 '13 at 22:00
  • But using a reference to a temporary is not a good idea anyway, isn't it? – Tomek Jul 23 '13 at 20:33
  • Depends on what you mean with "not a good idea". Binding to a const lvalue ref or a rvalue ref extends the lifetime of the temporary, so it doesn't necessarily lead to UB. – dyp Jul 23 '13 at 22:15
0

A possible approach would be using the null object pattern. While it might make your string builder bigger, it will still avoid the memory allocations.

template <>
class stringbuilder<std::string,std::string> {
   std::string        lhs_value;
   std::string        rhs_value;
   const std::string& lhs;
   const std::string& rhs;

   stringbuilder(const std::string &lhs, const std::string &rhs) 
      : lhs(lhs), rhs(rhs) {}

   stringbuilder(std::string&& lhs, const std::string &rhs) 
      : lhs_value(std::move(lhs)), lhs(lhs_value), rhs(rhs) {}

   stringbuilder(const std::string& lhs, std::string&& rhs)
      : rhs_value(std::move(rhs)), lhs(lhs), rhs(rhs_value) {}

   stringbuilder(std::string&& lhs, std::string&& rhs)
      : lhs_value(std::move(lhs)), rhs_value(std::move(rhs)),
        lhs(lhs_value), rhs(rhs_value) {}
//...

If the argument to the constructor is an lvalue, then you store a reference to the real object. If the argument to the constructor is an rvalue, you can move that into an internal variable with almost no cost (move operations are cheap) and store a reference to that internal object. The rest of the code can access the reference knowing (well, at least hoping) that the string will still be alive.

The hoping part is because there is nothing blocking misuse if an lvalue is passed but the object is destroyed before the stringbuilder completes its job.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • @DyP: typo in the code... the members are `std::string` and `const std::string&` (they could very well be `const std::string` and `const std::string&` but I started with non-const in case you want to detect that one of them is an rvalue and start from there (rather than an empty string) in `operator std::string()`, although I later thought that it would not make much sense overall. – David Rodríguez - dribeas Jul 22 '13 at 22:04
  • Ok, I almost thought so :) This fixes the second problem `right_type result = str1 + f();` but not the first `right_type result = str1 + str2; str1 = "fie";`, right? – dyp Jul 22 '13 at 22:09
  • @DyP: Exactly. I would have to stop and think of a design that would solve also that part of the problem. – David Rodríguez - dribeas Jul 22 '13 at 22:11
  • `template struct stringbuilder { RHS rhs; LHS lhs; stringbuilder(LHS&& lhs, RHS&& rhs):std::forward(lhs), std::forward(rhs); };`, where you carefully set your `RHS` and `LHS` types before you pick your constructor, saves you a lot of boilerplate. If `LHS&&` is an lvalue, then `LHS` is a value, and if `LHS&&` is an rvalue, then `LHS` is a reference! – Yakk - Adam Nevraumont Jul 23 '13 at 00:06
  • @Yakk: Good idea (the syntax in the comment is a bit off, but I get the idea). Now, this changes the boilerplate in the constructors of `stringbuilder` but just moves it to `operator+` where it probably belongs. – David Rodríguez - dribeas Jul 23 '13 at 13:00
0

Here is another attempt at solving the issue of dangling references. It doesn't solve the issue of references to things that are modified though.

The idea is to store the temporaries into values, but to have references to lvalues (that we can expect to keep living after the ;).

// Temporary => store a copy
// Otherwise, store a reference
template <typename T>
using URefUnlessTemporary_t
= std::conditional_t<std::is_rvalue_reference<T&&>::value
,                    std::decay_t<T>
,                    T&&>
;

template <typename LHS, typename RHS>
struct StringExpression
{
    StringExpression(StringExpression const&) = delete;
    StringExpression(StringExpression     &&) = default;

    constexpr StringExpression(LHS && lhs_, RHS && rhs_)
        : lhs(std::forward<LHS>(lhs_))
        , rhs(std::forward<RHS>(rhs_))
        { }

    explicit operator std::string() const
    {
        auto const len = size(*this);
        std::string res;
        res.reserve(len);
        append(res, *this);
        return res;
    }

    friend constexpr std::size_t size(StringExpression const& se)
    {
        return size(se.lhs) + size(se.rhs);
    }


    friend void append(std::string & s, StringExpression const& se)
    {
        append(s, se.lhs);
        append(s, se.rhs);
    }

    friend std::ostream & operator<<(std::ostream & os, const StringExpression & se)
    { return os << se.lhs << se.rhs; }

private:
    URefUnlessTemporary_t<LHS> lhs;
    URefUnlessTemporary_t<RHS> rhs;
};

template <typename LHS, typename RHS>
StringExpression<LHS&&,RHS&&> operator+(LHS && lhs, RHS && rhs)
{
    return StringExpression<LHS&&,RHS&&>{std::forward<LHS>(lhs), std::forward<RHS>(rhs) };
}

I've no doubt this could be simplified.

int main ()
{
    constexpr static auto c = exp::concatenator{};
    {
        std::cout << "RVREF\n";
        auto r = c + f() + "toto";
        std::cout << r << "\n";
        std::string s (r);
        std::cout << s << "\n";
    }

    {
        std::cout << "\n\nLVREF\n";
        std::string str="lvref";
        auto r = c + str + "toto";
        std::cout << r << "\n";
        std::string s (r);
        std::cout << s << "\n";
    }

    {
        std::cout << "\n\nCLVREF\n";
        std::string const str="clvref";
        auto r = c + str + "toto";
        std::cout << r << "\n";
        std::string s (r);
        std::cout << s << "\n";
    }
}

NB: I don't provide size(), append() nor concatenator, they aren't the points where the difficulties lie.

PS: I've used C++14 only to simplify the type traits.

Luc Hermitte
  • 31,979
  • 7
  • 69
  • 83