Why doesn't the compiler optimize an empty ranged-for loop over the elements of a set?

Question

When testing my code I noticed a significant increase in execution time when the empty ranged-for loop was deleted or not. Normally I would think that the compiler would notice that the for loop serves no purpose and would therefor be ignored. As the compiler flags I am using -O3 (gcc 5.4). I also tested it with a vector instead of a set and that seems to work and give the same execution time in both cases. It seems that the incrementation of the iterator costs all the extra time.

First case with the ranged for loop still present (slow):

#include <iostream>
#include <set>
int main () {
    long result;
    std::set<double> results;
    for (int i = 2; i <= 10000; ++i) {
        results.insert(i);
        for (auto element : results) {
            // no operation
        }
    }
    std::cout << "Result: " << result << "\n";
}

Second case with the ranged for loop deleted (fast):

#include <iostream>
#include <set>
int main () {
    long result;
    std::set<double> results;
    for (int i = 2; i <= 10000; ++i) {
        results.insert(i);
    }
    std::cout << "Result: " << result << "\n";
}

In some cases `-O3` might actually be worse than `-O2`. Try using `-O2`, and also take a look at the generated code. — Some programmer dude, Jan 15 '18 at 09:04
Looking at this in [godbolt](https://godbolt.org/g/s14gDA), it seems the compiler fails to optimize away the std::set::iterator increments (i.e. this issue is also present when using a "regular" for-loop using iterators). No clue why the compiler doesn't optimize this loop away though. — AVH, Jan 15 '18 at 09:09
After looking at the disassembly, I confirm that neither gcc nor clang nor VS optimize this with O2 O3 or Os — Guillaume Gris, Jan 15 '18 at 09:09
@YSC an empty loop is a empty loop. That's not minimal whenever the real problem doesn't contain an empty loop. And if it does it is crap. — , Jan 15 '18 at 09:25
@manni66 Some real crap is that this situation can happen in production code and the compiler won't optimize it out. — E_net4, Jan 15 '18 at 09:36
It seems, the problem appears when looping over a list: `void f (const std::list& l) { for (auto e : l) {}}` — Guillaume Gris, Jan 15 '18 at 09:40
Do you see the same effect with other containers (including `std::array`)? If not, then it's likely that the set's iterators are too complex for the optimizer. — Toby Speight, Jan 15 '18 at 09:40
@manni66 and whoever thinks that all production code is handwritten should slow down their judgement... — Quentin, Jan 15 '18 at 09:50
It seems std::array is optimised, but std::vector is not; makes me suspect that @TobySpeight hit the mark, or there's a side effect of the iterators that the compiler doesn't feel safe to remove. — UKMonkey, Jan 15 '18 at 09:54
It seems that any kind of node-based traversal will prevent optimization, even on the most simple kind of list. Here's [an example with a minimal hand-rolled singly linked list](https://godbolt.org/g/2WJQKC). — ComicSansMS, Jan 15 '18 at 09:55
With `std::vector`, gcc-7.2 in `-O2` does optimize away iteration: https://godbolt.org/g/aKNbkx — YSC, Jan 15 '18 at 10:09
Compilers does not seem to optimize this: `struct S { S* next; };` `void f (S* s) { while (s) s = s->next; }` I don't know why though — Guillaume Gris, Jan 15 '18 at 10:12
@ComicSansMS Seems like this is only the case for gcc & clang. icc18 does optimize away the empty loop. — AVH, Jan 15 '18 at 10:34
I'd say, the reason why this is not optimized, is that the loop shouldn't be there in the first place. It's just not common for code to loop over anything without actually doing work. Thus you cannot expect the compiler writers to have included optimizations for just this code. I've long since come to the conclusion that this kind of problems will always crop up if you rely on your optimizer to clean up behind you. Write clean code that does sensible work sensibly, and your optimizer will be your friend; write bloated cruft, and your optimizer will fail you where you need it most. — cmaster - reinstate monica, Jan 15 '18 at 11:39
How do you expect the compiler to know that the collection's iterator doesn't have side effects that you intend to take advantage of? — Alnitak, Jan 15 '18 at 12:00

Guillaume Gris · Answer 1 · 2018-01-15T12:01:17.067

23

Internally std::set iterator uses some kind of pointer chain. This seems to be the issue.

Here is a minimal setup similar to your issue:

struct S
{
    S* next;
};

void f (S* s) {
    while (s)
        s = s->next;
}

It's not a problem with complex collection implementations or overhead of iterators but simply this pointer chain pattern that the optimizer can't optimize.

I don't know the precise reason why optimizers fail on this pattern though.

Also, note that this variant is optimized away:

void f (S* s) {
    // Copy paste as many times as you wish the following two lines
    if(s)
        s = s->next;
}

Edit

As suggested by @hvd this might have to do with the compiler not being able to prove the loop is not infinite. And if we write the OP loop like so:

void f(std::set<double>& s)
{
    auto it = s.begin();
    for (size_t i = 0; i < s.size() && it != s.end(); ++i, ++it)
    {
        // Do nothing
    }
}

The compiler optimizes everything away.

edited Jan 15 '18 at 12:01

answered Jan 15 '18 at 10:24

Guillaume Gris

2,135
17
34

1

If s->next points at garbage then the next iteration may potentially crash - so if that was optimized out, a bug in the code wouldn't manifest itself. That said, I don't know if that's a potential reason - just theoretizing. – CookiePLMonster Jan 15 '18 at 10:27
@CookiePLMonster If you default-initialize `S::next` to nullptr (which I think you should) then what you described won't happen. – Al.G. Jan 15 '18 at 10:29
@CookiePLMonster I added a variant to show that the problem lies in the loop – Guillaume Gris Jan 15 '18 at 10:30
21

@CookiePLMonster The compiler is allowed to assume that undefined behavior never occurs. So this is not a valid reason to prevent optimization here. – ComicSansMS Jan 15 '18 at 10:31
1

@CookiePLMonster, if `s->next` points at garbage, then you're into Undefined Behaviour, and the optimiser isn't required to make it crash. – Toby Speight Jan 15 '18 at 10:31
Fair point! Then it's the case on how eager compilers are to optimize on UB =) – CookiePLMonster Jan 15 '18 at 10:32
1

`std::set` is definitely _not_ required to use `std::list::iterator`. I guess some implementation might, but it's hardly universal. – Useless Jan 15 '18 at 10:34
15

If `s->next == s`, then the loop is infinite. Although compilers are nowadays allowed to optimise away infinite loops, they do generally still attempt to preserve intentionally potentially infinite loops, and it may well be that in this case, the compiler cannot detect that this is not intended to possibly be infinite. – Jan 15 '18 at 10:35
What do you mean when you say "Internally `std::set` uses `std::list` iterators"? Did you mean to say Bidirectional iterators? `std::set` certainly isn't implemented in terms of `std::list`, it's typically implemented as a self-balanced binary tree. – Blastfurnace Jan 15 '18 at 10:36
@Useless You are right, it's very common though and I think it boils down to something similar in different implementations – Guillaume Gris Jan 15 '18 at 10:36
1

Side-note: icc16/17/18 optimize the `while` loop away. – AVH Jan 15 '18 at 10:38
I was definitely mistaken of the `std::list` argument. I was thinking of `std::unordered_set` and even for these I'm not sure it's always true. Thanks @Useless and @blastfurnace – Guillaume Gris Jan 15 '18 at 10:44
"this might have to do with the compiler not being able to proove the loop is not infinite". Aren't infinite loops still undefined behavior in C++? – Voo Jan 15 '18 at 11:56
@Voo Yes, but see my earlier comment. – Jan 15 '18 at 12:08

score 9 · Answer 2 · answered Jan 15 '18 at 10:11

The range based for loop is not as trivial as it looks. It is translated to an iterator-based loop internally in the compiler and if the iterator is complex enough the compiler may not even be allowed by the standard to remove these iterator operations.

score 6 · Answer 3 · edited Jan 15 '18 at 17:11

Range-for is "syntactic sugar", meaning what it does is simply provide short-hand notation for something that can be expressed in more verbose manner. For example, range-for transforms into something like this.

for (Type obj : container) ->

auto endpos = container.end();
for ( auto iter=container.begin(); iter != endpos; ++iter)
{
     Type obj(*iter);
     // your code here
}

Now the problem is that begin/end/*iter/++iter/(obj = ) are function-calls. In order to optimize them out, the compiler needs to know that they have no side-effects, (changes to global state). Whether the compiler can do this or not is implementation defined, and will depend on the container type. What I can say though, in most case you do not need the (obj =) function, so prefer

for (const auto& X: cont)

or ...

for (auto& X: cont)

to ...

for (auto X : cont)

You might find that simplifies it enough for optimizations to kick in.

These changes unfortunately don't do anything to improve the generated assembly. — AVH, Jan 15 '18 at 14:07

ivaigult · Answer 4 · 2018-01-15T11:30:53.020

You could play around with clang optimization report. Compile your code with save-optimization-record enabled, so optimization report will be dumped to main.opt.yaml.

clang++ -std=c++11 main.cpp -O2 -fsave-optimization-record

You will see that there are several problems with the loop:

Clang thinks, that there is a value modified in this loop.

- String: value that could not be identified as reduction is used outside the loop

Also, the compiler can't compute the number of loop iterations.

- String: could not determine number of loop iterations

Note, that compiler successfully inlined begin, end, operator++ and operator=.

Why doesn't the compiler optimize an empty ranged-for loop over the elements of a set?

4 Answers4

Edit