Avoid code duplication when using C++11 copy & move

Question

C++11 "move" is a nice feature, but I found it difficult to avoid code duplication (we all hate this) when used with "copy" at the same time. The following code is my implementation of a simple circular queue (incomplete), the two push() methods are almost the same except one line.

I have run into many similar situations like this. Any ideas how to avoid this kind of code duplication without using macro ?

=== EDIT ===

In this particular example, the duplicated code can be refactored out and put into a separate function, but sometimes this kind of refactoring is unavailable or cannot be easily implemented.

#include <cstdlib>
#include <utility>

template<typename T>
class CircularQueue {
public:
    CircularQueue(long size = 32) : size{size} {
        buffer = std::malloc(sizeof(T) * size);
    }

    ~CircularQueue();

    bool full() const {
        return counter.in - counter.out >= size;
    }

    bool empty() const {
        return counter.in == counter.out;
    }

    void push(T&& data) {
        if (full()) {
            throw Invalid{};
        }
        long offset = counter.in % size;
        new (buffer + offset) T{std::forward<T>(data)};
        ++counter.in;
    }

    void push(const T& data) {
        if (full()) {
            throw Invalid{};
        }
        long offset = counter.in % size;
        new (buffer + offset) T{data};
        ++counter.in;
    }

private:
    T* buffer;
    long size;
    struct {
        long in, out;
    } counter;
};

Just as a note, std::forward can be replaced by std::move here. — MikeMB, Dec 25 '15 at 09:04
@Nawaz Could you please explain why std::move is better than std::forward in this situation? — user416983, Dec 25 '15 at 09:26
@user416983: Because the object/expression which the parameter `data` binds to is an *rvalue* even though the parameter itself an *lvalue*. Since the actual object is rvalue, `std::move` is appropriate, which converts *back* the lvalue to rvalue. Read more about `std::forward` and `std::move` and their differences, to know which one to use when. — Nawaz, Dec 25 '15 at 10:18
@user416983 here `T&&` is not a forwarding reference. It's an rvalue reference. That's because `T` is not a template parameter of the function. It's of the class so it's already deduces when the class is instantiated. — bolov, Dec 25 '15 at 10:20
"but sometimes this kind of refactoring is unavailable or cannot be easily implemented" – Such as when? — Emil Laine, Dec 25 '15 at 10:29
"Consider pass by value for copyable parameters that are cheap to move and always copied" from _Effective Modern C++_. If `T` moves cheaply, consider `void push(T data) {... new (buffer + offset) T{std::move(data)}; ... }` It is 1 copy + 1 move for `const T&` or 2 moves for `T&&`. — jingyu9575, Dec 25 '15 at 13:43
What I've never understood about the need to avoid code duplication is this: how many circular queue types do you need to create? The kinds of objects that need explicit copy/move operations are low-level type stuff. Once you're past the low-level of explicit owners of some kind of memory, you should just rely on the compiler-generated copy/move operations. Unless you're forced to use a compiler that doesn't do that right... — Nicol Bolas, Dec 25 '15 at 14:47
in C++ generally we would [use an allocator](http://en.cppreference.com/w/cpp/concept/Allocator) to manage memory rather than `std::malloc` — Mgetz, Dec 25 '15 at 16:29

bolov · Accepted Answer · 2015-12-25T10:31:45.280

The simplest solution here is to make the parameter a forwarding reference. This way you can get away with only one function:

template <class U>
void push(U&& data) {
    if (full()) {
        throw Invalid{};
    }
    long offset = counter.in % size;
    // please note here we construct a T object (the class template)
    // from an U object (the function template)
    new (buffer + offset) T{std::forward<U>(data)};
    ++counter.in;
}

There are disadvantages with method though:

it's not generic, that is it cannot always be done (in a trivial way). For instance when the parameter is not as simple as T (e.g.SomeType<T>).
You delay the type check of the parameter. Long and seemingly unrelated compiler error may follow when push is called with wrong parameter type.

By the way, in your example T&& is not a forwarding reference. It's an rvalue reference. That's because T is not a template parameter of the function. It's of the class so it's already deduced when the class is instantiated. So the correct way to write your code would have been:

void push(T&& data) {
    ...
    ... T{std::move(data)};
    ...
}

void push(const T& data) {
   ... T{data};
   ...
}

"You delay the type check of the parameter. Long and seemingly unrelated compiler error may follow when push is called with wrong parameter type." -- very true, a converting constructor may get called. — user416983, Dec 25 '15 at 10:46
To avoid the second disadvantage, you could make this a private function `push_helper` taking the forwarding reference, and then have `void push(T&& data) { push_helper(std::move(data)); }` `void push(T const &data) { push_helper(data); }` — M.M, Dec 25 '15 at 11:34
You don't need to create a second function to improve the error message, you could just static assert that T is constructible from U and give a good message. — Nir Friedman, Dec 25 '15 at 16:35

Yakk - Adam Nevraumont · Answer 2 · 2015-12-25T19:39:52.373

The solution of using a forwarding reference is a good one. In some cases it gets difficult or annoying. As a first step, wrap it with an interface that takes explicit types, then in the cpp file send them to a template implementation.

Now sometimes that first step fails as well: if there are N different arguments that all need to be forwarded into a container, this requires an interface of size 2^N, and possibly it has to cross multiple layers of interfaces to get to the implementation.

To that end, instead of carrying or taking specific types, we can carry the end action(s) around. At the outermost interface, we convert arbitrary types into that/those action(s).

template<class T>
struct construct {
  T*(*action)(void* state,void* target)=nullptr;
  void* state=nullptr;
  construct()=default;
  construct(T&& t):
    action(
      [](void*src,void*targ)->T*{
        return new(targ) T( std::move(*static_cast<T*>(src)) );
      }
    ),
    state(std::addressof(t))
  {}
  construct(T const& t):
    action(
      [](void*src,void*targ)->T*{
        return new(targ) T( *static_cast<T const*>(src) );
      }
    ),
    state(const_cast<void*>(std::addressof(t)))
  {}
  T*operator()(void* target)&&{
    T* r = action(state,target);
    *this = {};
    return r;
  }
  explicit operator bool()const{return action;}
  construct(construct&&o):
    construct(o)
  {
    action=nullptr;
  }
  construct& operator=(construct&&o){
    *this = o;
    o.action = nullptr;
    return *this;
  }
private:
  construct(construct const&)=default;
  construct& operator=(construct const&)=default;
};

Once you have a construct<T> ctor object, you can construct an instance of T via std::move(ctor)(location), where location is a pointer properly aligned to store a T with enough storage.

A constructor<T> can be implicitly converted from a rvalue or lvalue T. It can be enhanced with emplace support as well, but that requires a bunch more boilerplate to do correctly (or more overhead to do easily).

Live example. The pattern is relatively simple type erasure. We store the operation in a function pointer, and the data in a void pointer, and reconstruct the data from the void pointer in the stored action function pointer.

There is modest cost in the above type erasure/runtime concepts technique.

We can also implement it like this:

template<class T>
struct construct :
  private std::function< T*(void*) >
{
  using base = std::function< T*(void*) >;
  construct() = default;
  construct(T&& t):base(
    [&](void* target)mutable ->T* {
      return new(target) T(std::move(t));
    }
  ) {}
  construct(T const& t):base(
    [&](void* target)->T* {
      return new(target) T(t);
    }
  ) {}
  T* operator()(void* target)&&{
    T* r = base::operator()(target);
    (base&)(*this)={};
    return r;
  }
  explicit operator bool()const{
    return (bool)static_cast<base const&>(*this);
  }
};

which relies on std::function doing the type erasure for us.

As this is designed to only work once (we move from the source), I force an rvalue context and eliminate my state. I also hide the fact I'm a std::function, because it doesn't follow those rules.

The 2^N explosion occurs when you don't use a forwarding reference (and use lvalue/rvalue overloads instead); how does 2^ N explosion occur with forwarding references? — Nir Friedman, Dec 25 '15 at 16:38
@nir with forwarding references, you get different functions generated for each set of l/r value overloads actually passed, not for each possible set. You are also restricted to implementing in header (in practice), which gives you another compile-time performance hit. With the above trick, you can implement privately, and only expose what actions you intend to do with eachargument in the type erasure case. I am not claiming the above technique is best always, just useful sometimes. — Yakk - Adam Nevraumont, Dec 25 '15 at 16:51
@Yakk ah ok it wasn't clear to me what the 2^N referred to. Seems like a nice technique. — Nir Friedman, Dec 25 '15 at 16:56
@nir well you can reach 4^N different functions generated with forwarding references, but you need 4^N calls... so less of an issue. — Yakk - Adam Nevraumont, Dec 25 '15 at 16:58

score 0 · Answer 3 · edited May 23 '17 at 11:53

Foreword

Introducing code duplication when adding support of move semantics to your interface is very annoying. For each function you have to make two almost identical implementations: the one which copies from the argument, and the one which moves from the argument. If a function has two parameters, it's not even code duplication it's code quadruplication:

void Func(const TArg1  &arg1, const TArg2  &arg2); // copies from both arguments
void Func(const TArg1  &arg1,       TArg2 &&arg2); // copies from the first, moves from the second
void Func(      TArg1 &&arg1, const TArg2  &arg2); // moves from the first, copies from the second
void Func(      TArg1 &&arg1,       TArg2 &&arg2); // moves from both

In general case you have to make up to 2^N overloads for a function where N is the number of parameters. In my opinion this makes move semantics practically unusable. It is most disappointing feature of C++11.

The problem could have happened even earlier. Let's take a look at the following piece of code:

void Func1(const T &arg);
T Func2();

int main()
{
    Func1(Func2());
    return 0;
}

It's quite strange that a temporary object is passed into the function that takes a reference. The temporary object may even not have an address, it can be cached in a register for example. But C++ allows to pass temporaries where a const (and only const) reference is accepted. In that case the lifetime of temporary is prolonged till the end of lifetime of the reference. If there were no this rule, we would have to make two implementations even here:

void Func1(const T& arg);
void Func1(T arg);

I don't know why the rule that allows to pass temporaries where reference is accepted was created (well, if there were no this rule we would not be able to call copy constructor to make a copy of a temporary object, so Func1(Func2()) where Func1 is void Func1(T arg) would not work anyway :) ), but with this rule we don't have to make two overloads of the function.

Solution #1: Perfect forwarding

Unfortunately there is no such simple rule which would make it unnecessary to implement two overloads of the same function: the one which takes a const lvalue reference and the one which takes a rvalue reference. Instead perfect forwarding was devised

template <typename U>
void Func(U &&param) // despite the fact the parameter has "U&&" type at declaration,
                     // it actually can be just "U&" or even "const U&", it’s due to
                     // the template type deducing rules
{
    value = std::forward<U>(param); // use move or copy semantic depending on the 
                                    // real type of param
}

It may look as that simple rule which allows to avoid duplication. But it is not simple, it uses unobvious template "magic" to solve the problem, and also this solution has some disadvantages that follow from the fact that the function that uses perfect forwarding must be templated:

The implementation of the function must be located in a header.
It blows up the binary size because for each used combination of the parameters type (copy/move) it generates separate implementation (you have single implementation in the source code and at the same time you have up to 2^N implementations in the binary).
There is no type checking for the argument. You can pass value of any type into the function (since the function accepts template type). The actual checking will be done at the points where parameter is actually used. This may produce hard-to-understand error messages and lead to some unexpected consequences.

The last problem can be solved by creating non-template wrappers for perfect-forwarding functions:

public:
    void push(      T &&data) { push_fwd(data); }
    void push(const T  &data) { push_fwd(data); }

private:
    template <typename U>
    void push_fwd(U &&data)
    {
        // actual implementation
    }

Of course it can be used in practice only if the function has few parameters (one or two). Otherwise you have to make too many wrappers (up to 2^N, you know).

Solution #2: Runtime check for movability

Eventually I got to the idea that checking arguments for movablity should be done not at compile-time but at runtime. I created some reference-wrapper class with constructors that took both types of references (rvalue and const lvalue). The class stored the passed to constructor reference as const lvalue reference and additionally it stored the flag whether the passed reference was rvalue. Then you could check at runtime whether the original reference was rvalue and if so you just casted the stored reference to rvalue-reference.

Unsurprisingly someone else had got to this idea before me. He named this as "in idiom" (I called this "pmp" - possibly movable param). You can read about this idiom in details here and here (original page about "in" idiom, I recommend to read all 3 parts of article if you are really interested in problem, the article reviews the problem in depth).

In short the implementation of the idiom looks like this:

template <typename T> 
class in
{
public:
  in (const T& l): v_ (l), rv_ (false) {}
  in (T&& r): v_ (r), rv_ (true) {}

  bool rvalue () const {return rv_;}

  const T& get () const {return v_;}
  T&& rget () const {return std::move (const_cast<T&> (v_));}

private:
  const T& v_; // original reference
  bool rv_;    // whether it is rvalue-reference
};

(Full implementation also contains special case when some types can be implicitly converted into T)

Example of usage:

class A
{
public:
  void set_vec(in<std::vector<int>> param1, in<std::vector<int>> param2)
  {
      if (param1.rvalue()) vec1 = param1.rget(); // move if param1 is rvalue
      else                 vec1 = param1.get();  // just copy otherwise
      if (param2.rvalue()) vec2 = param2.rget(); // move if param2 is rvalue
      else                 vec2 = param2.get();  // just copy otherwise
  }
private:
  std::vector<int> vec1, vec2;
};

The implementation of "in" lacks copy and move constructors.

class in
{
  ...
  in(const in  &other): v_(other.v_), rv_(false)     {} // always makes parameter not movable
                                                        // even if the original reference
                                                        // is movable
  in(      in &&other): v_(other.v_), rv_(other.rv_) {} // makes parameter movable if the
                                                        // original reference was is movable
  ...
};

Now we can use it in this way:

void func1(in<std::vector<int>> param);
void func2(in<std::vector<int>> param);

void func3(in<std::vector<int>> param)
{
    func1(param); // don't move param into func1 even if original reference
                  // is rvalue. func1 will always use copy of param, since we
                  // still need param in this function

    // some usage of param

    // now we don’t need param
    func2(std::move(param)); // move param into func2 if original reference
                             // is rvalue, or copy param into func2 if original
                             // reference is const lvalue
}

We could also overload an assignment operator:

template<typename T>
T& operator=(T &lhs, in<T> rhs)
{
    if (rhs.rvalue()) lhs = rhs.rget();
    else              lhs = rhs.get();
    return lhs;
}

After that we would not need to check for ravlue each time, we could just use it in this way:

   vec1 = std::move(param1); // moves or copies depending on whether param1 is movable
   vec2 = std::move(param2); // moves or copies depending on whether param2 is movable

But unfortunately C++ doesn’t allow overload of operator= as global function (https://stackoverflow.com/a/871290/5447906). But we can rename this function into assign:

template<typename T>
void assign(T &lhs, in<T> rhs)
{
    if (rhs.rvalue()) lhs = rhs.rget();
    else              lhs = rhs.get();
}

and use it like this:

    assign(vec1, std::move(param1)); // moves or copies depending on whether param1 is movable
    assign(vec2, std::move(param2)); // moves or copies depending on whether param2 is movable

Also this won’t work with constructors. We can’t just write:

std::vector<int> vec(std::move(param));

This requires the standard library to support this feature:

class vector
{
    ...
public:
    vector(std::in<vector> other); // copy and move constructor
    ...
}

But standards doesn’t know anything about our "in" class. And here we can’t make workaround similar to assign, so the usage of the "in" class is limited.

Afterword

T, const T&, T&& for parameters it’s too many for me. Stop introducing things that do the same (well, almost the same). T is just enough!

I would prefer to write just like this:

// The function in ++++C language:
func(std::vector<int> param) // no need to specify const & or &&, param is just parameter.
                             // it is always reference for complex types (or for types with
                             // special qualifier that says that arguments of this type
                             // must be always passed by reference).
{
    another_vec = std::move(param); // move parameter if it's movable.
                                    // compiler hides actual rvalue-ness
                                    // of the arguments in its ABI
}

I don’t know if standard committee considered this sort of move semantics implementation but it’s probably too late to make such changes in C++ because they would make ABI of compilers incompatible with previous versions. Also it adds some runtime overhead, and there may be other problems that we don’t know.

Avoid code duplication when using C++11 copy & move

3 Answers3