14

In order to understand compilers and in particular assembly language better, I have been experimenting with a trivial piece of code where the sum of the first N numbers is calculated, which should result in N(N+1)/2 or N(N-1)/2.

As the code shows there are two functions:

#include <cstdint>


// Once compiled with optimization, the generated assembly has a loop

uint64_t sum1( uint64_t n ) {  
    uint64_t sum = 0;
    for ( uint64_t j=0; j<=n; ++j ) {
        sum += j;
    }
    return sum;
}

// Once compiled with optimization, the generated assembly of the following has no loop

uint64_t sum2( uint64_t n ) {  
    uint64_t sum = 0;
    for ( uint64_t j=0; j<n; ++j ) {
        sum += j;
    }
    return sum;
}

In the first function I loop from O to N i.e. j<=n and in the second function I go from O to N-1 i.e. j<n.

My understanding/observation:

  • For the first function sum1 the generated assembly has a loop while for the second function sum2 the assembly shows no loop. However, once I remove the compiler optimizations i.e. -O3, then you can finally see the loop for the second function in assembly.

  • To see the generated assembly with compiler optimization, please see this Optimized.

  • To see the generated assembly without compiler optimization, please see this non-optimized.

  • Compiler is x86-64 clang

Question: Why does the compiler optimization not show the other loop in the assembly?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
BZKN
  • 1,499
  • 2
  • 10
  • 25
  • 10
    The compiler seems to be observing that `sum1` could loop forever, for a certain `n`. – Drew Dormann Jan 09 '22 at 00:39
  • @DrewDormann Which would however be UB and could be ignored by the compiler. – user17732522 Jan 09 '22 at 00:40
  • @DrewDormann and why is that so? – BZKN Jan 09 '22 at 00:40
  • 2
    The compiler’s (ideal) goal is to create the most efficient code, which has the same observable behavior as the source code (when optimizing). Add an output of the temp in the loop to force the compiler to keep the loop. – 500 - Internal Server Error Jan 09 '22 at 00:41
  • 2
    Consider `sum1(std::numeric_limits::max())` – Igor Tandetnik Jan 09 '22 at 00:48
  • 1
    It is not UB, @user17732522, in that edge case the result is very well defined: an infinite loop. – Sam Varshavchik Jan 09 '22 at 00:48
  • @500 - Internal Server Error: But is there also another way to look at it? I mean in some cases you may need to see the complete assembly code and if it optimizes essential loops, then are optimization always worth it? – BZKN Jan 09 '22 at 00:50
  • @SamVarshavchik An infinite loop that doesn't have any side effects is UB per **[intro.progress]/1**. For rationale, see [N1528: Why undefined behavior for infinite loops?](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1528.htm) – Igor Tandetnik Jan 09 '22 at 00:55
  • 5
    @SamVarshavchik: As user17732522 notes, the compiler may assume the loop terminates; this is an axiom granted by the C or C++ standard. This is logically equivalent to assuming `n` != `UINT64_MAX`, as those are all the cases and the only cases for which the loop terminates. And so it is equivalent to saying that if the program executes with `n` equal to `UINT64_MAX`, the behavior is not defined by the C or C++ standard. It is not defined to be an infinite loop. – Eric Postpischil Jan 09 '22 at 01:01
  • 2
    @IgorTandetnik: That is only for C++. In C, it does not apply to all loops; an iteration statement with a controlling expression that is a constant expression (including an absent expression, which is implicitly non-zero) may continue infinitely with no side effects. And hence you could write OP’s loop and get defined infinite-loop behavior instead of undefined behavior by moving the `j <= n` from the controlling expression to a conditional `break;`. – Eric Postpischil Jan 09 '22 at 01:04
  • 1
    @BKN I would require some proof in form of an llvm opt tool or a link or something real that proves the claims. This entire conversation, including one response is 100% speculative and can be easily disproven with a counter-example. –  Jan 09 '22 at 01:52
  • Occam's Razor: It seems reasonable that the implementation contains a pattern matcher. One pattern is sum(i, i=1,k)=k(k+1)/2. The fact that the loop with <= isn't exactly a 1..k loop (e.g. in a symbolic execution the domain of the iteration space includes \bottom), the rule doesn't match. – Gene Jan 09 '22 at 05:10
  • 2
    From looking over `clang++ -mllvm -O3 -print-after-all opt.cc`, it looks like the pass that replaces the loop with the closed form is Induction Variable Simplification. So one could look through its source, or even step through it, to try to determine where the two versions diverge, and exactly what is being tested. – Nate Eldredge Jan 09 '22 at 06:16
  • 2
    For an example of clang exploiting "infinite loop is UB", try taking the first version, but replacing `return sum;` with `return 5;`. On its face it appears that it should return 5 unless passed `UINT64_MAX`, in which case it should loop infinitely. But in fact it returns 5 unconditionally. https://godbolt.org/z/M6jhjq4en This apparently happens in the SROA pass. – Nate Eldredge Jan 09 '22 at 06:23

1 Answers1

16

This is because your compiler is very, very smart, and it knows that the sum of all values from 0 to n can be calculated with a trivial mathematical formula, instead of a loop.

However, your C++ compiler also figured out that this mathematical formula cannot be used in the <= version because for certain input values a bug gets triggered that results in an infinite loop, so all bets are off, and the compiler compiles the code exactly as given.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • 1
    It is UB because the compiler may assume forward-progress: https://timsong-cpp.github.io/cppwp/n4868/basic#intro.progress-1 in C++ and https://port70.net/~nsz/c/c11/n1570.html#6.8.5p6 in C – user17732522 Jan 09 '22 at 00:46
  • @Sam Varshavchik: thanks.. got it. But is there also another way to look at it? I mean in some cases you may need to see the complete assembly code and if it optimizes essential loops, then are optimization always worth it? – BZKN Jan 09 '22 at 00:48
  • 3
    It is absolutely true that when debugging code you do not want these kinds of optimizations taking place. Which is precisely why optimizations require a configuration setting to enable (or disable). – Sam Varshavchik Jan 09 '22 at 00:49
  • 2
    If you make the infinite loop more obvious, clang will recognize it and remove it, even if the value calculated in the sum is used. It even produces an empty function without instructions, which would result in non-sense programs: https://godbolt.org/z/cv6ndjYhT/z/Pff5rf1eh – user17732522 Jan 09 '22 at 00:57
  • @user17732522 - allowing the implementation to assume the loop will terminate just means the implementation is allowed to ignore figuring out if it ever will, doesn't it? – jxh Jan 09 '22 at 01:01
  • @user17732522 thanks for this infinite loop code example....helped me more to understand. – BZKN Jan 09 '22 at 01:02
  • @user17732522 The footnote also indicates this semantic is really only intended for empty loop removal. But in the OP's code, there is a loop side effect being returned. – jxh Jan 09 '22 at 01:07
  • Fixed link for my comment above: https://godbolt.org/z/8nKPjxeKT – user17732522 Jan 09 '22 at 01:07
  • 1
    @Jellyboy That compilers are allowed to replace loops with closed form formulas? That would be the "as if" rule. – jxh Jan 09 '22 at 01:09
  • @jxh: The note says it is intended to allow transformations **such as** removal of empty loops, not **only** for empty loops. (However, in OP’s code, the loop performs a computation which can be calculated algebraically without a loop, so that ordinary optimization can hoist all the code out of the loop, leaving an empty loop, which can then be removed using this special rule.) You can see an example of a transformation other than removal of an empty loop [here](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1528.htm). – Eric Postpischil Jan 09 '22 at 01:10
  • @Jellyboy I think Sam is just using hyperbole about the compiler intelligence. You could argue it was not smart enough to figure out how to O(1) optimize the first function. – jxh Jan 09 '22 at 01:12
  • @jxh Eric Postpischil has given an explanation in the question comments. I can't say anything about the intentions, but Igor Tandetnik linked rational in the question comments. The way it is worded in the standard (specifically excluding `volatile`, atomic operations etc.) also seems clear to me that it shouldn't only apply to empty loops. They could have chosen a simpler wording for that. – user17732522 Jan 09 '22 at 01:12
  • @EricPostpischil Thanks. The closed form formula replacement can be considered a kind of loop removal too. – jxh Jan 09 '22 at 01:16
  • 1
    @jxh Yes, the point of "infinite loop is UB" observation is precisely that the compiler missed this optimization. It could have replaced the loop with a closed-form formula and ignored the corner case that results in the infinite loop, because that loop is UB anyhow. – Igor Tandetnik Jan 09 '22 at 01:22
  • 6
    @Jellyboy: It's likely that the pattern-recognizer for sum-of-linear-sequence loops is looking for specific things in the loop, which presumably includes the loop being provably finite. The fact that *this* part of clang/LLVM fails to use value-range information from `__builtin_assume` could just as easily be a missed optimization. Although interestingly, if you do `n &= 0xfff` instead of assume, the optimizer is able to use Gauss's formula: https://godbolt.org/z/Mn16PYvqo (With a little less effort to avoid overflow, but still more than needed for the tiny `n`.) – Peter Cordes Jan 09 '22 at 08:13
  • 3
    @Jellyboy: It's the obvious and by far the most likely explanation. Some experiments on both versions are consistent with that theory, e.g. `j += 3` makes the `j – Peter Cordes Jan 09 '22 at 08:37
  • @EricPostpischil: Regarding infinite loops as UB is lousy for optimization. Far better would be to codify that if no action within a loop isn't sequenced before some later action, the loop as a whole isn't either. There are many situations where it would be acceptable to omit a loop, or to execute it until a task is externally forcibly terminated, but where arbitrary behavior would be unacceptable. Requiring programmers include dummy side effects would prevent a compiler from omitting the loop, which would be the useful purpose of the optimization. – supercat Jan 09 '22 at 15:48
  • @Jellyboy: What exactly are you hoping would be different in this answer? A weasel word like "almost certainly" when stating the cause? It's well known that potentially-infinite loops are a problem for some compiler optimization passes, so it's not bad advice in general, and does in fact work for all variations of clang's sum-of-sequence optimization that we've tested. (Except for ones where being non-infinite is only provable when considering `__builtin_assume`, which that pass doesn't seem to account for.) – Peter Cordes Jan 09 '22 at 17:06
  • 3
    Yes, this is SO, we should expect *good* speculation, ideally backed up by sufficient evidence to go with that theory. Also useful practical advice, which this is. It's unreasonable to expect an answerer to dig into the LLVM source code and find the exact pattern-recognizer algorithm to rule out any other possibility. If someone knows of a more accurate explanation (or worse, if an explanation is totally wrong, which is not rare on performance question based on how CPUs/caches work), they can comment. But lack of definitive proof isn't IMO a problem, at least here. – Peter Cordes Jan 09 '22 at 17:08
  • 1
    @Jellyboy: There's a difference between *wild* speculation and random guesses vs. well-informed guesses and very reasonable conclusions on a subject an expert is familiar with. (Not all speculation is good; I've seen wrong explanations upvoted. Also usually should be indicated as such.) But if you aren't satisfied with the evidence of my previous few comments to support this answer's conclusion, yes, you're in the wrong place. (The answer itself is pretty minimal in terms of supporting its conclusion; showing that `j < n ; j += 3` also doesn't optimize would make it much stronger.) – Peter Cordes Jan 09 '22 at 20:02