Why don't modern C++ compilers optimize away simple loops like this? (Clang, MSVC)

Question

When I compile and run this code with Clang (-O3) or MSVC (/O2)...

#include <stdio.h>
#include <time.h>

static int const N = 0x8000;

int main()
{
    clock_t const start = clock();
    for (int i = 0; i < N; ++i)
    {
        int a[N];    // Never used outside of this block, but not optimized away
        for (int j = 0; j < N; ++j)
        {
            ++a[j];  // This is undefined behavior (due to possible
                     // signed integer overflow), but Clang doesn't see it
        }
    }
    clock_t const finish = clock();
    fprintf(stderr, "%u ms\n",
        static_cast<unsigned int>((finish - start) * 1000 / CLOCKS_PER_SEC));
    return 0;
}

... the loop doesn't get optimized away.

Furthermore, neither Clang 3.6 nor Visual C++ 2013 nor GCC 4.8.1 tells me that the variable is uninitialized!

Now I realize that the lack of an optimization isn't a bug per se, but I find this astonishing given how compilers are supposed to be pretty smart nowadays. This seems like such a simple piece of code that even liveness analysis techniques from a decade ago should be able to take care of optimizing away the variable a and therefore the whole loop -- never mind the fact that incrementing the variable is already undefined behavior.

Yet only GCC is able to figure out that it's a no-op, and none of the compilers tells me that this is an uninitialized variable.

Why is this? What's preventing simple liveness analysis from telling the compiler that a is unused? Moreover, why isn't the compiler detecting that a[j] is uninitialized in the first place? Why can't the existing uninitialized-variable-detectors in all of those compilers catch this obvious error?

Maybe they recognizes the calls to `clock` around the loop and realize that you really want that nonsense to execute to measure it. Too many times I see questions wondering why one's fancy "expensive" loop takes no time to run. — John, Aug 06 '14 at 03:25
@John: Uh, no. The loop code still isn't optimized away even when I remove all the function calls (even including `fprintf`). — user541686, Aug 06 '14 at 03:31
There are no side effects in the loop except stack allocation and incrementation of pointless locations in memory, which the compiler is allowed to optimize away unless using the keyword `volatile`. It really is odd that `clang` doesn't see that there is no use in executing the loop... Does the compiler manage to see the pointlessness of doing nothing if you remove the inner loop? — Fors, Aug 06 '14 at 03:34
@Fors: You don't even need to remove the loop, you can just change the array to a single variable and it gets optimized away fine. But in that case, Clang detects the variable is uninitialized, but GCC and MSVC just can't seem to figure that out. — user541686, Aug 06 '14 at 03:39
It looks like there is no warning at all whenever the type is an array, regardless of the presence/absence of loops. This code compiles without any warnings no matter the compiler flags: `int main(){ int x[1]; ++x[0];}`, on both Clang and g++ — vsoftco, Aug 06 '14 at 04:20
@vsoftco: I've been playing around with it and it seems the problem isn't just arrays, but loops too. Even something as simple as `int a; ++a;` doesn't generate a warning with GCC (4.8.1) if it's inside a loop. — user541686, Aug 06 '14 at 04:21
@Mehrdad yes I saw this too, very strange indeed. On the other hand, I do not understand why uninitialized arrays do not emit warnings at all, even in the simplest possible code. Actually I now realized that a warning is emitted for arrays, but one has to use `-O` optimization greater than 0. — vsoftco, Aug 06 '14 at 04:23
@vsoftco: Yeah, it baffles me too. But wait, but I've been using -O3 and not getting any warnings... what version of GCC are you on? — user541686, Aug 06 '14 at 04:26
I was talking about the `int main(){ int x[1]; ++x[0];}` code, which does not emit a warning on `-O0`, but emits for `-O1`, `-O2` and `-O3`. I use g++ (MacPorts gcc49 4.9-20140416_2) 4.9.0 20140416 (prerelease). For the loops, whenever the un-initialized is inside the loop, all bets are off, no more warnings on g++, only on clang. And if the type is an array, then even clang fails to emit a warning. — vsoftco, Aug 06 '14 at 04:29
I just compiled that code with GCC 4.8.1 (using `O2`) and it most certainly optimizes out that entire loop. Not sure what GCC you're using... — Damon, Aug 06 '14 at 11:30
@Damon: *"Not sure what GCC you're using"*... uh, I'm *not* using GCC there. Did you read the question? — user541686, Aug 06 '14 at 11:31
@Mehrdad: Indeed I did, it says _"... nor GCC 4.8.1"_. I'm using GCC 4.8.1 here, and compiled the provided code, and the compiler works just as expected, the entire loop is optimized out. It admittedly doesn't warn about using array elements that are not initialized when one could argue it maybe should (it probably does with `-Winitialized`, not tested that), but seeing how it dead-strips the code alltogether that doesn't really matter. — Damon, Aug 06 '14 at 12:12
@Damon: Indeed you didn't: *"...nor GCC 4.8.1" ... _what_?*... what's the rest of that sentence **following** the part you just quoted? It's **not** talking about GCC's optimization of the loop, if you actually read the sentence it's pretty obvious it's talking about something else. — user541686, Aug 06 '14 at 12:15
@Mehrdad: So, after reading it again very thoroughly, you're saying that GCC **does** optimize it out (only the other two don't), you merely don't like the fact that it doesn't warn about accessing the uninitialized array elements? My bad, sorry. The not warning part is, as I've pointed out, debatable. It probably should, but then again there is no real consequence. I'm more bothered that compilers don't tell you "stripped out a whole 8 lines of code that have no side effect, but you probably didn't mean it that way". :-) — Damon, Aug 06 '14 at 12:25
@Damon: "Debatable?" The title clearly asks about optimizations by "Clang, MSVC" and the *very first sentence* also clearly says "Clang or MSVC" don't optimize the code out, so I'm not sure why you think the debate is centered around GCC or around the warning in the first place. I'm not complaining about the warning itself, I'm just making a note of the fact that the lack of a warning probably implies the compiler failed to detect the simple loop can be optimized out because of UB. — user541686, Aug 06 '14 at 12:33
@Mehrdad: Edited that into the question itself. You can remove your comment and I'll remove mine. — einpoklum, Apr 18 '17 at 21:58

Richard Smith · Answer 1 · 2014-08-08T18:20:00.577

7

The undefined behavior is irrelevant here. Replacing the inner loop with:

    for (int j = 1; j < N; ++j)
    {
        a[j-1] = a[j];
        a[j] = j;
    }

... has the same effect, at least with Clang.

The issue is that the inner loop both loads from a[j] (for some j) and stores to a[j] (for some j). None of the stores can be removed, because the compiler believes they may be visible to later loads, and none of the loads can be removed, because their values are used (as input to the later stores). As a result, the loop still has side-effects on memory, so the compiler doesn't see that it can be deleted.

Contrary to n.m.'s answer, replacing int with unsigned does not make the problem go away. The code generated by Clang 3.4.1 using int and using unsigned int is identical.

edited Aug 08 '14 at 18:20

answered Aug 06 '14 at 22:24

Richard Smith

13,696
56
78

1

*"The undefined behavior is irrelevant here."*... that's not the impression I got from n.m.'s answer. If replacing signed with unsigned changes the behavior then UB has to be relevant doesn't it? – user541686 Aug 06 '14 at 22:46
4

Replacing all the `int`s with `unsigned`s in this example does not change the behavior. Compare the generated code [before](http://goo.gl/kL9jd4) and [after](http://goo.gl/A306w4). n.m.'s answer is wrong. – Richard Smith Aug 08 '14 at 18:17
1

+1 Wow, I should've tested it. I just took it for granted that he was right, and now I look stupid. Testing it on Clang 3.6.0 (Windows) now, I don't see it happening either... I wonder what flags he was using, or if he accidentally ran g++ instead of clang++... – user541686 Aug 08 '14 at 18:24
1

The behavior here is also undefined. – jthill Aug 08 '14 at 20:33
@jthill Why do you think that? – Richard Smith Aug 12 '14 at 19:13
`a[1]` is accessed before being initialized.. As I understand it, while in C accessing that gets you an unspecified value, in C++ accessing that gets you undefined behavior. – jthill Aug 13 '14 at 00:05
@jthill In C++14 (and in C) it's UB. In C++11 I think it's not necessarily UB, though it's a bit ambiguous and it depends on how you interpret the spec. – bames53 Aug 13 '14 at 02:40
2

@jthill Good point. However, initializing the array doesn't make any difference here either; the code is still not removed. – Richard Smith Aug 21 '14 at 01:34

score 3 · Answer 2 · answered Aug 06 '14 at 09:44

3

It's an interesting issue with regards to optimizing. I would expect that in most cases, the compiler would treat each element of the array as an individual variable when doing dead code analysis. Ans 0x8000 make too many individual variables to track, so the compiler doesn't try. The fact that a[j] doesn't always access the the same object could cause problems as well for the optimizer.

Obviously, different compilers use different heuristics; a compiler could treat the array as a single object, and detect that it never affected output (observable behavior). Some compilers may choose not to, however, on the grounds that typically, it's a lot of work for very little gain: how often would such optimizations be applicable in real code?

answered Aug 06 '14 at 09:44

James Kanze

150,581
18
184
329

1

@Mehrdad His answer and mine are complementary. As I said in the second paragraph: different compilers use different heuristics in determining what and how deep to analyze. My answer is just to point out that having actually worked on compilers, what he's seeing doesn't really surprise me. – James Kanze Aug 06 '14 at 10:05
1

I understand, but the reason I mentioned that is that it seems to me (from the `unsigned` thing in n.m.'s answer) that Clang is *intentionally* not optimizing this -- i.e., the compiler writers had a reason to avoid doing this optimization. Compilers already do much more complicated dead-code elimination, so when they can't figure out something this simple, do you really just go ahead and call it an accident? I feel like there's a good reason behind not performing this optimization that I'm not aware of. – user541686 Aug 06 '14 at 10:21
1

@Mehrdad Agreed. It does seem strange that they would optimize `unsigned` and not `int`. (Changing the type to `unsigned` doesn't remove the undefined behavior; _all_ variables have to be initialized before being read.) It would be interesting to see what is going on internally in the compiler. – James Kanze Aug 06 '14 at 10:25
Hmm, are you sure it would be UB for unsigned? I'm asking because this answer claims it would be well-defined for unsigned, and it seems logical to me: http://stackoverflow.com/a/11965368/541686 (Edit: I just realized that answer is only about C, so I'm not sure if it applies to C++ too.) – user541686 Aug 06 '14 at 10:29
@Mehrdad In both C and C++, accessing an uninitialized value is undefined behavior, for all non-character types. The answer you cite is simply wrong. The C standard is most explicit (§6.2.6/5): "Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined." In C++, you have to deduce this from the fact that the value representation may not use all of the bits in the object representation. – James Kanze Aug 06 '14 at 10:44
@Mehrdad: It's reasonable to assume that the optimizer is shared between C and C++, and uses the weakest common assumption. I.e. if behavior X is undefined in one of the two languages, use the behavior specified by the other. – MSalters Aug 06 '14 at 10:44
@MSalters: Yes I'm aware, but I'm not sure where you're going with that statement. JamesKanze: As far as I can tell, the quote you cited doesn't logically imply what you wrote. The quote explicitly said "**if** the stored value has such a representation [...] **then** the behavior is undefined", but clearly my compiler's implementation of integers doesn't have any invalid representations -- it's a standard x86 compiler and all the values represented by an integer's bits represent a valid integer. So how does that imply what you just said (that it's UB)? – user541686 Aug 06 '14 at 10:48
@Mehrdad The standard (C at least) clearly allows trapping representations, and says that accessing them is undefined behavior; code which reads objects with unspecified bit patterns may encounter that undefined behavior. The fact that that undefined behavior is "getting some unspecified value" on a particular machine doesn't change the undefinedness of the operation. – James Kanze Aug 06 '14 at 10:51
@JamesKanze: But I don't understand -- there **aren't** any trap representations in this implementation to begin with! (This is exactly what the answer I pointed you to is saying.) So how is the behavior of accessing trap representations relevant when they don't exist in the first place? – user541686 Aug 06 '14 at 10:51
@Mehrdad From a standards point of view: undefined behavior is undefined behavior, and trap representations are potentially present. The fact that some implementations don't have trap representations doesn't impact the standard. An implementation may define undefined behavior. In this case, I don't think any actually do, although the actual behavior may be quite deterministic and well known. – James Kanze Aug 06 '14 at 10:57
@JamesKanze: But then why do you think the standard said *"If the stored value of an object has such a representation"*? Why do you think they didn't just make it a straightforward *"reading the value of an object by an lvalue expression without a character type is undefined behavior"*? What was the point of adding that "if" clause? – user541686 Aug 06 '14 at 11:04
Wow, n.m.'s answer was wrong. I can't believe I didn't check it, it undermined basically every other answer. +1 to you. – user541686 Aug 08 '14 at 19:22
@Mehrdad A detail in n.m's answer was wrong; changing to `unsigned` doesn't remove the undefined behavior. He's still given a good description of what the compiler is actually doing. (Why, we don't know.) – James Kanze Aug 11 '14 at 09:16
@Mehrdad "clearly my compiler's implementation of integers doesn't have any invalid representations -- it's a standard x86 compiler and all the values represented by an integer's bits represent a valid integer" Even if hardware does not have any trapping representations the compiler can still optimize as though there are trap representations. This is just like the case where the hardware implements wrapping signed integer overflow but the compiler optimizes based on overflow being UB. – bames53 Aug 11 '14 at 21:24
@bames53: If you think that's how it works then could you answer the question I posed 2 comments earlier? – user541686 Aug 11 '14 at 21:28
@Mehrdad In C++14 they have changed it so that expressions that evaluate to indeterminant values directly produces UB, and second "If the stored value of an object has such a representation" refers to the whole implementation, not just the hardware. An implementation in C++11 could specify that a type has no trap representations and thereby bind itself not to act is if this had UB. – bames53 Aug 11 '14 at 21:36
@bames53: No I don't think you understood what I'm saying. The hardware is irrelevant. What I'm saying is that if the implementation has trap representations, then you *must* be able to detect this by comparing `numeric_limits::max/min` and `N = sizeof(T) * CHAR_BIT`. When all 2^N representations form valid integers, and the data is stored *in memory* (rather than a register), there *cannot* be any trap representations because all the possible bit patterns are non-trapping and the program can *prove* this. This is *not* possible in the signed overflow UB scenario. – user541686 Aug 11 '14 at 22:01
@Mehrdad To the best of my knowledge there's no requirement for implementations to behave that way. I don't see anything in the spec that prevents compilers from optimizing based on treating objects as trap representations even though there's no real bit pattern that could physically represent a trap. – bames53 Aug 11 '14 at 23:17
@bames53: You're thinking about it too hard -- the quote (and the requirement) is right there: "if [it] has such a representation" clearly implies that this does not apply when the implementation does not have such a representation. Therefore when the program can *prove* such a representation does not exist, the compiler cannot treat it as UB, because that would contradict the proof. It's as simple as that. – user541686 Aug 11 '14 at 23:18
@Mehrdad Whether it has such a representations is up to the whole implementation, not just the hardware. The implementation can say '16 bit ints on this implementation have 2^16 valid values and one trap representation which does not correspond to any 16-bit bit pattern, because then I can optimized based on trap representations.' – bames53 Aug 11 '14 at 23:23
@bames53: Wouldn't that contradict [item #4 in §6.2.6.1](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf)? The trap representation must be memcpy-able into an array of unsigned chars which must uniquely represent that value, but that necessarily implies the trap representation *must* be one of the 2^N bit patterns -- otherwise it's masquerading as a non-trapping representation and therefore must be non-trapping. (Note that this is why the rule only applies when the data is in memory -- register-only values cannot be copied to an array of unsigned char without going through memory.) – user541686 Aug 11 '14 at 23:41
@Mehrdad I will admit that the wording of 6.2.6.1/4..5 is a little tricky and it depends on how you interpret some of it. Mainly I think there is no consistent interpretation of those paragraphs that can conclude that trap representations are values, so much of 6.2.6.1/4 does not apply to trap representations. I think a compiler that has 'virtual' trap representations could avoid difficulty here by, for example, never using them for variables that might be memcpy()ed into unsigned char arrays. – bames53 Aug 12 '14 at 00:13
@bames53: *"Mainly I think there is no consistent interpretation of those paragraphs that can conclude that trap representations are values"*... uh... huh? The standard clearly says (in 3.17.2) that an "indeterminate value" is either an "unspecified value" or a "trap representation". So there is no question as to whether trap representations are values; they *definitely* are. They may not be values that the implementation treats as integers, but they are still values and must therefore have a clear and visible bitwise representation in memory (because memcpy'ing them must reproduce them). – user541686 Aug 12 '14 at 00:57
@Mehrdad I'm not convinced. A normal interpretation of 'indeterminate value' would conclude that they're all values, but I don't think that works here. 'Value' is defined as "precise meaning of the contents of an object when interpreted as having a specific type" and 'trap representations' do "not represent a value of the object type." If a trap representation is copied into an unsigned char array then clearly the unsigned char array has a value, but if an implementation never allows 'virtual' trap representations to be copied and accessed that way then I don't see where that violates the spec – bames53 Aug 12 '14 at 02:02
@Mehrdad "because memcpy'ing them must reproduce them" If the spec allowed you to memcpy a 'virtual' trap representation but did not reproduce the trap value when memcpying the value back then I agree that would violate the spec. I don't see any violation if the compiler never treats variables as trap representations if they happen to later get memcpyed. I.e. the 'virtual' trap representations are purely compile-time entities and variables never have a 'virtual' trap representation unless the compiler can prove it's safe. – bames53 Aug 12 '14 at 02:06
@bames53: I think at some point you have to stop, take a deep breath, and start using common sense instead of making up more and more notions ("virtual" representations?!) for the sake of argument. Go back to the original argument that caused all of this; you said an implementation *"could specify that a type has no trap representations and thereby bind itself not to act is if this had UB"*. Well, it could **already** do that *whether or not* the quote started with *"if the stored value has such a representation"*. Just think for a moment what the *point* of the quote was in the first place. – user541686 Aug 12 '14 at 02:28
@Mehrdad That's not the quote that started this. I think what started this is my claim that a C++11 implementation could also specify that a type does have trap representations and optimize accordingly regardless of the actual bit-patterns used at runtime. And given such an implementation then "if the stored value has such a representation" evaluates to 'true'. – bames53 Aug 12 '14 at 02:57
@Mehrdad "I think at some point you have to stop, take a deep breath, and start using common sense" Well now, that wouldn't be language lawyering. My interest in analyzing the standards is often finding things for a so-called 'evil' compiler to exploit while still technically conforming. Exploring the limits is where the fun is. – bames53 Aug 12 '14 at 02:58
@Mehrdad Also, to clarify what I meant by "could specify that a type has no trap representations and thereby bind itself not to act is if this had UB," I meant that an implementation could do this when the underlying hardware _does_ have a trap representation. The implementation would have to insert code to ensure that the program never revealed that fact by its behavior. This would again foil your "`numeric_limits::max/min` and `N = sizeof(T) * CHAR_BIT`" check for trap representations, by giving false positives. – bames53 Aug 12 '14 at 03:08
@bames53: You have to realize, language specifications aren't mathematical axioms or proofs that are written once and used forever; they can (and do) have defects and shortcomings. That's one reason why they specify "rationales" in the documents -- to help get their points across even when they're not mathematically bulletproof. To me in this case, the point they were trying to make is pretty crystal clear -- that trap representations, if they can exist in memory, must be one of the visible bitwise representations of the type. I don't see any point in language-lawyering farther than that. – user541686 Aug 12 '14 at 03:09
@bames53: As to your second comment, I thought the entire question was about what the implementation could do when the underlying hardware *doesn't* have a trap representation. I don't understand your new reasoning but in any case I just don't have the energy to discuss it anymore; unless you have something radically new to bring to the discussion (e.g. an entirely new quote) I don't really have anything new to present to the discussion. – user541686 Aug 12 '14 at 03:12
@Mehrdad "they can (and do) have defects and shortcomings" Absolutely, and I enjoy finding and reporting them to make specs better if I can. "to help get their points across even when they're not mathematically bulletproof" I think the same common sense reasoning would lead many people to to think many of the optimizations modern compilers actually do based on UB are illegitimate. A lot of people actually do think that. I don't think it's practical to act that way or to leave one's interpretation of the spec at that. – bames53 Aug 12 '14 at 03:17
@Mehrdad "I thought the entire question was about what the implementation could do when the underlying hardware doesn't have a trap representation." My intent with that comment was to point out both sides: an implementation can behave as if there are trap representations even if the underlying hardware does not support any, and an implementation can behave as if there are no trap representations even if the underlying hardware does have them. – bames53 Aug 12 '14 at 03:19
@bames53: *"I think the same common sense reasoning would lead many people to to think many of the optimizations modern compilers actually do based on UB are illegitimate."* Absolutely not. I'm referring the common sense of people who are already familiar with this stuff and can understand the point that the standard is trying to make, not the common sense of random people who haven't seen compilers or language specifications. – user541686 Aug 12 '14 at 03:19

score 3 · Answer 3 · edited May 23 '17 at 11:54

++a[j];  // This is undefined behavior too, but Clang doesn't see it

Are you saying this is undefined behavior because the array elements are uninitialized?

If so, although this is a common interpretation of clause 4.1/1 in the standard I believe it is incorrect. The elements are 'uninitialized' in the sense that programmers usually use this term, but I do not believe this corresponds exactly to the C++ specification's use of the term.

In particular C++11 8.5/11 states that these objects are in fact default initialized, and this seems to me to be mutually exclusive with being uninitialized. The standard also states that for some objects being default initialized means that 'no initialized is performed'. Some might assume this means that they are uninitialized but this is not specified and I simply take it to mean that no such performance is required.

The spec does make clear that the array elements will have indeterminant values. C++ specifies, by reference to the C standard, that indeterminant values can be either valid representations, legal to access normally, or trap representations. If the particular indeterminant values of the array elements happen to all be valid representations, (and none are INT_MAX, avoiding overflow) then the above line does not trigger any undefined behavior in C++11.

Since these array elements could be trap representations it would be perfectly conformant for clang to act as though they are guaranteed to be trap representations, effectively choosing to make the code UB in order to create an optimization opportunity.

Even if clang doesn't do that it could still choose to optimize based on the dataflow. Clang does know how to do that, as demonstrated by the fact that if the inner loop is changed slightly then the loops do get removed.

So then why does the (optional) presence of UB seem to stymie optimization, when UB is usually taken as an opportunity for more optimization?

What may be going on is that clang has decided that users want int trapping based on the hardware's behavior. And so rather than taking traps as an optimization opportunity, clang has to generate code which faithfully reproduces the program behavior in hardware. This means that the loops cannot be eliminated based on dataflow, because doing so might eliminate hardware traps.

C++14 updates the behavior such that accessing indeterminant values itself produces undefined behavior, independent of whether one considers the variable uninitialized or not: https://stackoverflow.com/a/23415662/365496

The reason I said it's UB is that there is no guarantee they are not `INT_MAX` whatsoever, and thus they can overflow. Therefore the compiler can assume they *are* `INT_MAX` and they *will* overflow, which is undefined. — user541686, Aug 11 '14 at 20:46
@Mehrdad If you add `a[j] = INT_MAX;` then the loop is eliminated just as with `a[j] = 0;`, so clang obviously is not assuming `INT_MAX` on its own. I think the best explanation is that the compiler is trying to preserve the hardware trapping behavior. — bames53, Aug 11 '14 at 21:44

score 2 · Answer 4 · answered Aug 06 '14 at 08:25

That is indeed very interesting. I tried your example with MSVC 2013. My first idea was that the fact that the ++a[j] is somewhat undefined is the reason why the loop is not removed, because removing this would definetly change the meaning of the program from an undefined/incorrect semantic to something meaningful, so I tried to initialize the values before but the loops still did not dissappear.

Afterwards I replaced the ++a[j]; with an a[j] = 0; which then produced an output without any loop so everything between the two calls to clock() was removed. I can only guess about the reason. Perhaps the optimizer is not able to prove that the operator++ has no side effects for any reason.

Why don't modern C++ compilers optimize away simple loops like this? (Clang, MSVC)

4 Answers4

Linked