eliminate unnecessary copies when calling C++/STL algorithms

Question

I've coded the following example in order to better illustrate my questions.
In the code below, I introduce a function object (i.e., funObj).
In funObj class's definition an integral member variable called id is defined to hold the ID of every funObj constructed and a static integral member variable n to count the funObj objects created.
Thus, every time an object funObj is constructed n is increased by one and its value is assigned to the id field of the newly created funObj.
Furthermore, I've defined a default constructor, a copy constructor and a destructor. All three are printing messages to the stdout in order to signify their invocation along with the ID of the funObj they are referring to.
I've also defined a function func that takes as inputs by value objects of type funObj.

Code:

#include <vector>
#include <iostream>
#include <algorithm>
#include <functional>

template<typename T>
class funObj {
  std::size_t id;
  static std::size_t n;
public:
  funObj() : id(++n) 
  { 
    std::cout << "    Constructed via the default constructor, object foo with ID(" << id << ")" << std::endl;
  }
  funObj(funObj const &other) : id(++n) 
  {
    std::cout << "    Constructed via the copy constructor, object foo with ID(" << id << ")" << std::endl;
  }
  ~funObj()
  { 
    std::cout << "    Destroyed object foo with ID(" << id << ")" << std::endl;
  }
  void operator()(T &elem)
  { 

  }
  T operator()()
  {
    return 1;
  }
};

template<typename T>
void func(funObj<T> obj) { obj();  }

template<typename T>
std::size_t funObj<T>::n = 0;

int main()
{
  std::vector<int> v{ 1, 2, 3, 4, 5, };
  std::cout << "> Calling `func`..." << std::endl;
  func(funObj<int>());
  std::cout << "> Calling `for_each`..." << std::endl;
  std::for_each(std::begin(v), std::end(v), funObj<int>());
  std::cout << "> Calling `generate`..." << std::endl;
  std::generate(std::begin(v), std::end(v), funObj<int>());

  // std::ref 
  std::cout << "> Using `std::ref`..." << std::endl;
  auto fobj1 = funObj<int>();
  std::cout << "> Calling `for_each` with `ref`..." << std::endl;
  std::for_each(std::begin(v), std::end(v), std::ref(fobj1));
  std::cout << "> Calling `generate` with `ref`..." << std::endl;
  std::for_each(std::begin(v), std::end(v), std::ref(fobj1));
  return 0;
}

Output:

Calling func...

Constructed via the default constructor, object foo with ID(1)

Destroyed object foo with ID(1)

Calling for_each...

Constructed via the default constructor, object foo with ID(2)

Constructed via the copy constructor, object foo with ID(3)

Destroyed object foo with ID(2)

Destroyed object foo with ID(3)

Calling generate...

Constructed via the default constructor, object foo with ID(4)

Constructed via the copy constructor, object foo with ID(5)

Destroyed object foo with ID(5)

Destroyed object foo with ID(4)

Using std::ref...

Constructed via the default constructor, object foo with ID(6)

Calling for_each with ref...

Calling generate with ref...

Destroyed object foo with ID(6)

Discussion:

As you can see from the output above, calling function func with a temporary object of type funObj results in the construction of a single funObj object (even though func passes its argument by value). However, this seems not to be the case when passing temporary objects of type funObj to STL algorithms std::for_each and std::generate. In the former cases the copy constructor is evoked and an extra funObj is constructed. In quite a few applications the creation of such "unnecessary" copies deteriorates the performance of the algorithm significantly. Based on this fact the following questions are being raised.

Questions:

I know that most STL algorithms pass their argument by value. However, compared to func, that also passes its input argument by value, the STL algorithms generate an extra copy. What's the reason for this "unnecessary" copy?
Is there a way to eliminate such "unnecessary" copies?
When calling std::for_each(std::begin(v), std::end(v), funObj<int>()) and func(funObj<int>()) in which scope does temporary object funObj<int> lives, for each case respectively?
I've tried to use std::ref in order to force pass by reference and as you can see the "unnecessary" copy was eliminated. However, when I try to pass a temporary object to std::ref (i.e., std::ref(funObj<int>())) I get a compiler error. Why such kind of statements are illegal?
The output was generated using VC++2013. As you can see there's an anomaly when calling std::for_each the destructors of the objects are being called in reversed order. Why is that so?
When I run the code on Coliru that runs GCC v4.8 the anomaly with destructors is fixed however std::generate doesn't generate an extra copy. Why is that so?

Details/Comments:

The output above was generated from VC++2013.

Update:

I've also added to funObj class a move constructor (see code below).

 funObj(funObj&& other) : id(other.id)
  {
    other.id = 0;
    std::cout << "    Constructed via the move constructor, object foo with ID(" << id << ")" << std::endl;
  }

I've also turned on full optimization in VC++2013 and compiled in release mode.

Output (VC++2013):

Calling func...

Constructed via the default constructor, object foo with ID(1)

Destroyed object foo with ID(1)

Calling for_each...

Constructed via the default constructor, object foo with ID(2)

Constructed via the move constructor, object foo with ID(2)

Destroyed object foo with ID(2)

Destroyed object foo with ID(0)

Calling generate...

Constructed via the default constructor, object foo with ID(3)

Constructed via the copy constructor, object foo with ID(4)

Destroyed object foo with ID(4)

Destroyed object foo with ID(3)

Using std::ref...

Constructed via the default constructor, object foo with ID(5)

Calling for_each with ref...

Calling generate with ref...

Destroyed object foo with ID(5)

Output GCC 4.8

Calling func...

Constructed via the default constructor, object foo with ID(1)

Destroyed object foo with ID(1)

Calling for_each...

Constructed via the default constructor, object foo with ID(2)

Constructed via the move constructor, object foo with ID(2)

Destroyed object foo with ID(2)

Destroyed object foo with ID(0)

Calling generate...

Constructed via the default constructor, object foo with ID(3)

Destroyed object foo with ID(3)

Constructed via the default constructor, object foo with ID(4)

Calling for_each with ref...

Calling generate with ref...

Destroyed object foo with ID(4)

It seems that VC++2013 std::generate generates an extra copy no-matter if optimization flags are on and compilation is in release mode and besides the fact that a move constructor is defined.

To sum up most of this I think you just want to ask "Why does the STL take predicates by value?" — David, May 12 '14 at 16:02
@40two - You didn't mention what compilation options were used, more precisely, what optimization setting was used for VC++ 2013 — PaulMcKenzie, May 12 '14 at 16:06
by defining a copy constructor you are preventing the compiler from generating an implicitly defined move constructor. You can prevent most of those copies by defining your own move constructor. — YoungJohn, May 12 '14 at 16:10
@PaulMcKenzie I'm a basic user of VC++2013, had to borrow a laptop from work that had windows because thieves broke into my house a stole my macbook (sorry irrelevant) :-). I did an empty console application and compiled on debug mode (i.e., no optimizations). — 101010, May 12 '14 at 16:10
FWIW VS2013 doesn't generate move constructors anyway. You always need to provide your own. Have fun. — R. Martinho Fernandes, May 12 '14 at 16:13
@40two - You need to compile with optimizations turned on. Otherwise, you will get the "checked, but slow" debug version running. Statistics from debug builds are not a realistic picture of what code the compiler can really produce. — PaulMcKenzie, May 12 '14 at 16:13
@Dave - I would say that follows C / C++ semantics. The default is value semantics, and with the proper syntax added you can use pointers or references instead. If you specifically decide to use pointers, it means you're ready to cope with data ownership and lifetime problems. — Loomchild, May 12 '14 at 16:14
What happens if you define a move-constructor for your function-object class? If the STL implementers did their job correctly, that should eliminate the extra copies within the STL algorithms. — Mikael Persson, May 12 '14 at 16:15
@MikaelPersson I'll try to do it and I'll post an update if necessary. — 101010, May 12 '14 at 16:17
@40two, I tried on my system (with GCC 4.8), and adding a move constructor does indeed eliminate the extra copy (replaced by a move, in which case, you don't need to increment "n", because you can just "steal" the id from the source object). — Mikael Persson, May 12 '14 at 16:23
@Dave I was wondering this some time ago [Why the sequence-operation algorithms predicates are passed by copy?](http://stackoverflow.com/questions/17234543/why-the-sequence-operation-algorithms-predicates-are-passed-by-copy). — PaperBirdMaster, May 12 '14 at 16:25
@MikaelPersson can you post on Coliru or elsewhere your move constructor? — 101010, May 12 '14 at 16:30
@MikaelPersson it seems that in VC++2013 the move constructor applies only on `for_each` and not `generate` :s... WTH is going on? — 101010, May 12 '14 at 16:37
@40two I don't know, ask the VC++ development team (or the Dinkumware ppl). VC++ is notorious for being a Swiss cheese with lots of pitfalls, bugs, shortcuts, and partially implemented features. VC++ has been in a very shabby state of "under construction" for about 2 decades. Keep your expectations low, very low, when using it. GCC, CLang, and ICC are all far superior options, IMHO (also, keep in mind, no compiler + std-lib implementation is perfect). — Mikael Persson, May 12 '14 at 17:08
If you look at the implementation of VC++'s / Dinkumware's `std::generate`, it becomes clear why the copy ctor is called twice: there's a dispatch to some `_Generate` function template that has to do with checked iterators. I don't see *why* it's necessary (it doesn't reduce code duplication as other applications of this technique). It also lacks some `std::move` for the passed function object. This means that the first copy from the temporary to `std::generate` can and will be elided, but there's a second copy to `_Generate` which cannot be elided since its source is a parameter. — dyp, May 12 '14 at 22:05
@dyp you think this is a bad design on behalf of VC++ STL developers? — 101010, May 12 '14 at 22:12
They should have used `std::move` at least to move the function object. Maybe that part is not "updated" yet to C++11, or maybe there were other reasons not to. I don't see any, though. — dyp, May 12 '14 at 22:28

score 4 · Accepted Answer · edited May 23 '17 at 12:05

1 - I know that most STL algorithms pass their argument by value. However, compared to func, that also passes its input argument by value, the STL algorithms generate an extra copy. What's the reason for this "unnecessary" copy?

STL algorithms return the function object. This happens so that the mutation on the object will be observable. Your func returns void so that's a copy less.

Well, to be precise, generate does not return a thing (see comment by dyp)

2 - Is there a way to eliminate such "unnecessary" copies?

Well unnecessary is a bit too strong. The whole point of functors is to be lightweight objects so that a copy wouldn't matter. As for a way, the one you provide (std::ref) will do the job, alas a copy of the std::ref will be generated (your object won't get copied though)

Another way would be to qualify the call of the algorithm

then the function object type will be a reference :

auto fobj1 = funObj<int>();

std::for_each<std::vector<int>::iterator, std::vector<int>::iterator, 
funObj<int>&> // this is where the magic happens !!
(std::begin(v), std::end(v), fobj1);

3 - When calling std::for_each(std::begin(v), std::end(v), funObj()) and func(funObj()) in which scope does temporary object funObj lives, for each case respectively?

The body of std_for_each is expanded as follows :

template<class InputIterator, class Function>
  Function for_each(InputIterator first, InputIterator last, Function fn)
{ // 1
  while (first!=last) {
    fn (*first);
    ++first;
  }
  return fn;      // or, since C++11: return move(fn);
// 2
}

your function reads

template<typename T>
void func(funObj<T> obj) 
{ // 1.
    obj();  
// 2.
}

The comments 1 and 2 mark the lifespan in each case. Note though that if a return value optimization applies (named or unnamed), then the compiler may generate code that places the return value (the function object in for_each) in the stack frame of the caller, so the life span is longer.

4 - I've tried to use std::ref in order to force pass-by-reference and as you can see the "unnecessary" copy was eliminated. However, when I try to pass a temporary object to std::ref (i.e., std::ref(funObj())) I get a compiler error. Why such kind of statements are illegal?

std::ref does not work with r-value references (STL code follows):

template<class _Ty>
void ref(const _Ty&&) = delete;

you need to pass an l-value

5 - The output was generated using VC++2013. As you can see there's an anomaly when calling std::for_each the destructors of the objects are being called in reversed order. Why is that so?

6 - When I run the code on Coliru that runs GCC v4.8 the anomaly with destructors is fixed however std::generate doesn't generate an extra copy. Why is that so?

Check the settings for each compilation. With optimizations ON (and in Release for VS) copy elision / elimination of extra copies / ignoring non observable behaviors, are possible.
Secondly (as far as I can see) in VS 2013 the functor in for_each and the generator in generate are both passed by value (there's no signature accepting an r-value reference) so it would be clearly a matter of copy elision to save the extra copy.

For what matters, the STL implementation in gcc also has no signatures that accept r-value references (please notify me if one having is spotted)

template<typename _InputIterator, typename _Function>
_Function
for_each(_InputIterator __first, _InputIterator __last, _Function __f)
{
  // concept requirements
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
  __glibcxx_requires_valid_range(__first, __last);
  for (; __first != __last; ++__first)
__f(*__first);
  return _GLIBCXX_MOVE(__f);
}

so I may be going out on limb on this one and assume, that defining move semantics for your functor has no effect and only compiler optimizations apply to eliminate copies

`// or, since C++11: return move(fn);` you shouldn't explicitly move to a return value here, it's done automagically by the compiler. Only if it's not a local variable / parameter, or if the type of the return-expression is different from the return type. — dyp, May 12 '14 at 18:55
@dyp On `// or, since C++11: return move(fn);` it's a copy paste from [cplusplus.com](http://www.cplusplus.com/reference/algorithm/for_each/) — Nikos Athanasiou, May 12 '14 at 18:59
Meh... deleted that comment since I thought it was wrong. Turns out it was correct after all: *"Note though that if a return value optimization applies (named or unnamed), then the compiler may generate code that places the return value (the function object in for_each) in the stack frame of the caller"* RVO is not applied to function parameters. The lifetime of the temporary object is not dependent on the function to which it is passed, it just lives until the end of the full-expression in which it has been created, unless its lifetime is extended by binding it to a reference. — dyp, May 12 '14 at 19:02
@dyp I was just replying on that, you got me working overtime, 3 points in 30 secs ! I added the citation — Nikos Athanasiou, May 12 '14 at 19:07
@dyp OK found it : 12.8 _This elision of copy/move operations, called copy elision, is permitted in the following circumstances (...) -In a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value_ Should I call it copy elision then ? Isn't this NRVO here ? — Nikos Athanasiou, May 12 '14 at 19:34
I think it means "Other than a function" (other than returning a function) or "catch clause parameters" and not "other than function parameters or catch clause parameters" — Nikos Athanasiou, May 12 '14 at 19:35
Yeah it's ambiguous (the grammar of that sentence). But a function is not an object, so copy elision cannot apply to it. Furthermore you can look at the next numbered paragraph about moving to a return value, which explicitly allows *moving from a function parameter* as a relaxation of the copy-elision requirements. IIRC this also shows up in a DR. — dyp, May 12 '14 at 20:04

score 3 · Answer 2 · answered May 12 '14 at 16:20

3

The move semantics introduced in C++11 exist to largely alleviate this set of 'unnecessary' copies. If you define a move constructor for your function-objects the STL will move the function-object (even/especially if it is a temporary) which will prevent the copy from occurring. This will allow you to use the STL algorithms with value-semantics without sacrificing too much in the way of performance. It will also allow you to use temporary function-objects as desired.

answered May 12 '14 at 16:20

YoungJohn

946
12
18

It is expected that most compilers will generate an implicit move constructor if no user defined copy or move constructors or copy assignment operators are provided. This may not be the case on VC++ 2013. – YoungJohn May 12 '14 at 16:22
3

But the issue here is that there is a user-defined copy constructor, and therefore, the move constructor is implicitly removed. That's why a user-defined move constructor is needed here. – Mikael Persson May 12 '14 at 16:24