0

Does gcc understand or lookahead branches and perform optimization. I wrote a code like this

for (int i = 10;i < 20;i++) {
    long long sum = 1;

    for (int j = 0;j < 10000;j++) {
        sum += pow(i,3);
    }
    if (i == 15) {
        cout<<sum<<" "<<endl;
    } else {
        cout << 0<<" "<<endl;
    } 
}
nanos = get_nanos();
printf("current nanos: %ld\n", nanos - last_nanos);
last_nanos = nanos;

If I move the condition if (i == 15) before the for loop, then Ithe execution time reduces by 10x. Why doesnt the compiler understand that the for loop need not be executed unless i == 15?. Is there an option I need to turn on?

I'm compiling with the options - g++ -O3 -lrt

Alan Birtles
  • 32,622
  • 4
  • 31
  • 60
  • 1
    are you compiling with `-O3` ? – M.M Feb 21 '22 at 20:43
  • Why have you tagged a question with C++ code with the C tag? Pick one or the other. – Eric Postpischil Feb 21 '22 at 20:45
  • 2
    Edit the question to provide a [mre]. Do not make other people add their own timing code and harness to call this code. – Eric Postpischil Feb 21 '22 at 21:18
  • Please clarify what do you mean by [*"If I move the condition if (i == 15) before the for loop"*](https://godbolt.org/z/rWMez6Y9d) and how are you compiling it. – Bob__ Feb 21 '22 at 21:23
  • @Bob__: They mean changing `for (int j = 0;j < 10000;j++) { sum += pow(i,3); } if (i == 15) { cout< – Eric Postpischil Feb 21 '22 at 21:27
  • 1
    FYI, neither GCC nor Clang perform this optimization with `-O3`, but, if we change `pow(i, 3)` to `i*i*i`, [Clang seems to perform the optimization at least partly](https://godbolt.org/z/rjKvrM5r4) but GCC does not. Clang is doing something weird in computing `i*i*i`, apparently involving the constants 331, 1000, and 66 in a way I have not figured out, but the code that replaces the loop, notably `imul rsi, rax, 10000` is followed unconditionally by the stream insertion, so the branch has been moved above that aspect of the loop. – Eric Postpischil Feb 21 '22 at 21:30
  • Thanks, Why doesnt the compiler automatically detect that I am not using the "sum" variable for any other case other than i == 15 and optimize the code block out for all the other values of "i" – Manoj Kumar Feb 21 '22 at 21:32
  • 1
    @ManojKumar: Because nobody wrote the software to make it do that. – Eric Postpischil Feb 21 '22 at 21:32
  • 1
    in general the compiler doesnt know that `pow` has no side effects – pm100 Feb 21 '22 at 21:36
  • 1
    @pm100: `pow` has side effects; it updates the floating-point flags. However, because the standard library identifiers are reserved, compilers may assume uses of those identifiers refer to the library routines and have their specified behaviors. – Eric Postpischil Feb 21 '22 at 22:00
  • @pm100: That would be true if you compiled with `-fno-builtin-pow`. But in fact, gcc and clang treat it as a builtin, and will inline `pow` for small constant powers, at least with `-ffast-math` enabled. https://godbolt.org/z/ce7n51h1h (They still convert to `double` and back, although in theory they don't need to because double->int conversion overflow is undefined behaviour, so they don't need produce the same results as actually doing that on the target machine. And IEEE binary64 `double` can exactly represent every `int32_t` without rounding error.) It's a low-value missed optimization – Peter Cordes Feb 21 '22 at 22:55
  • @ManojKumar: Compilers aren't AI, they're just complex pieces of machinery. And they have to compile fast enough, so they generally avoid algorithms with exponential complexity. They also have to be maintainable by humans, so there's a limit to how much complexity we want in the compiler itself. So there are lots of reasons that compilers don't find every possible optimization, and why writing efficient code in the first place is still useful. – Peter Cordes Feb 21 '22 at 23:06
  • (And even hand-holding the compiler towards certain micro-optimizations that compiler devs might not have thought to look for, as in [Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?](https://stackoverflow.com/a/40355466) or [What is the efficient way to count set bits at a position or lower?](https://stackoverflow.com/q/34407437)) – Peter Cordes Feb 21 '22 at 23:13
  • `pow` also has the side effect that it may set `errno`, again unless you use `-ffast-math`. – Nate Eldredge Feb 22 '22 at 17:25
  • Please provide enough code so others can better understand or reproduce the problem. – Community Feb 23 '22 at 23:47

0 Answers0