Auto in loop and optimizations

Question

Can you explain me why there is such difference in computation time with the following codes (not optimized). I suspect RVO vs move-construction but I'm not really sure.

In general, what is the best practice when encountering such case ? Is auto declaration in a loop considered as a bad practice when initializing non-POD data ?

Using auto inside the loop :

std::vector<int> foo()
{
    return {1,2,3,4,5};
}

int main()
{
    for (size_t i = 0; i < 1000000; ++i)
        auto f = foo();
    return 0; 
}

Output :

./a.out 0.17s user 0.00s system 97% cpu 0.177 total

Vector instance outside the loop :

std::vector<int> foo()
{
    return {1,2,3,4,5};
}

int main()
{
    std::vector<int> f;

    for (size_t i = 0; i < 1000000; ++i)
         f = foo();
    return 0;
}

Output :

./a.out 0.32s user 0.00s system 99% cpu 0.325 total

In one loop there's declaration and initilization, while in the other there's only an assignment. Is this intended? If you're comparing oranges with oranges, shouldn't you be moving `std::vector f` inside the loop? — legends2k, Jun 12 '13 at 11:48
If you suspect move-construction is much faster, use your own vector class, forward the move constructor to the copy constructor, and measure again. Or look at the assembly and see what it's doing. Or add counters in your class's copy/move constructors and assignment operators and see what actually gets called. — Useless, Jun 12 '13 at 11:51
Also: bogus benchmarks alert. Any compiler with decent optimization 'skill' can optimize all of the loops away — sehe, Jun 12 '13 at 11:51
@BalogPal Unoptimized versions. But even in that case I don't get such differences. Using auto in C++11 is tempting.. I just want insights if it should be used here — 3XX0, Jun 12 '13 at 12:05
what about manually ensured optimization: ``` void foo(vector& output) { int values[] = {1,2,3,4,5}; output.assign(&values[0], &values[0] + 5); } int main() { std::vector f; for (size_t i = 0; i < 1000000; ++i) foo(f); return 0; } ``` — Zhaolin Feng, Mar 14 '23 at 13:56

Mike Seymour · Accepted Answer · 2013-06-12T12:14:10.130

I suspect RVO vs move-construction but I'm not really sure.

Yes, that is almost certainly what's happening. The first case move-initialises a variable from the function's return value: in this case, the move can be elided by making the function initialise it in place. The second case move-assigns from the return value; assignments can't be elided. I believe GCC performs elision even at optimisation level zero, unless you explicitly disable it.

In the final case (with -O3, which has now been removed from the question) the compiler probably notices that the loop has no side effects, and removes it entirely.

You might (or might not) get a more useful benchmark by declaring the vector volatile and compiling with optimisation. This will force the compiler to actually create/assign it on each iteration, even if it thinks it knows better.

Is auto declaration in a loop considered as a bad practice when initializing non-POD data ?

No; if anything, it's considered better practice to declare things in the narrowest scope that's needed. So if it's only needed in the loop, declare it in the loop. In some circumstances, you may get better performance by declaring a complicated object outside a loop to avoid recreating it on each iteration; but only do that when you're sure that the performance benefit (a) exists and (b) is worth the loss of locality.

Thanks ! Very useful especially the last sentence :) – 3XX0 Jun 12 '13 at 12:16 — 3XX0, Jun 12 '13 at 12:16

score 1 · Answer 2 · edited May 23 '17 at 11:56

I don't see your example having anything to do with auto. You wrote two different programs.

While

for (size_t i = 0; i < 1000000; ++i)
    auto f = foo();

is equivalent to

for (size_t i = 0; i < 1000000; ++i)
    std::vector<int> f = foo();

-- which means, you create a new vector (and destroying the old one). And, yes, in your foo-implementation using RVO, but that is not the point here: You still create a new vector in the place where the outer loop is making room for f.

The snippet

std::vector<int> f;
for (size_t i = 0; i < 1000000; ++i)
     f = foo();

uses assign to an existing vector. And, yes, with RVO it may become a move-assign, depending on foo, and it is in your case, so you can expect it to be fast. But it still is a different thing -- it is always the one f that is in charge in managing the resources.

But what you do show very beautifully here is that it often makes sense to follow the general rule

Declare variables as close to their use as possible.

See this Discussion

Zhaolin Feng · Answer 3 · 2023-03-15T04:33:28.957

I tested 3 versions on my PC. Manually optimized version is the fastest.

constexpr size_t LOOP = 1000000000;

std::vector<int> foo() { return {1, 2, 3, 4, 5}; }

void foo_optimized(vector<int> &output) {
  constexpr static int values[] = {1, 2, 3, 4, 5};
  output.assign(&values[0], &values[0] + 5);
}

int main(int argc, char **argv) {
  string type = string(argv[1]);
  if (type == "original") {

    std::vector<int> f;
    for (size_t i = 0; i < LOOP; ++i)
      f = foo();

  } else if (type == "RVO") {

    for (size_t i = 0; i < LOOP; ++i)
      auto f = foo();

  } else if (type == "optimized") {

    std::vector<int> f;
    for (size_t i = 0; i < LOOP; ++i)
      foo_optimized(f);
  }

  return 0;
}

$ g++ a.cpp -O3
$ time ./a.out original && time ./a.out RVO && time ./a.out optimized

real    0m11.671s
user    0m11.662s
sys     0m0.000s

real    0m15.012s
user    0m15.011s
sys     0m0.000s

real    0m0.767s
user    0m0.759s
sys     0m0.000s

For -O2, the result is:

original - 14.221s
RVO - 14.993s
optimized - 4.483s

Auto in loop and optimizations

3 Answers3