Why doesn't my C++ compiler optimize these memory writes away?

Question

I created this program. It does nothing of interest but use processing power.

Looking at the output with objdump -d, I can see the three rand calls and corresponding mov instructions near the end even when compiling with O3 .

Why doesn't the compiler realize that memory isn't going to be used and just replace the bottom half with while(1){}? I'm using gcc, but I'm mostly interested in what is required by the standard.

/*
 * Create a program that does nothing except slow down the computer.
 */
#include <cstdlib>
#include <unistd.h>

int getRand(int max) {
  return rand() % max;
}

int main() {
  for (int thread = 0; thread < 5; thread++) {
    fork();
  }
  int len = 1000;
  int *garbage = (int*)malloc(sizeof(int)*len);
  for (int x = 0; x < len; x++) {
    garbage[x] = x;
  }
  while (true) {
    garbage[getRand(len)] = garbage[getRand(len)] - garbage[getRand(len)];
  }
}

I guess you could argue that yielding a different pseudo-random sequence changes the observable behaviour of the program -- if the program actually output the result of a later `rand()`, which this one doesn't. — M.M, Sep 29 '15 at 22:02
`fork()` doesn't create multiple threads. It creates a new separate full-fledged *process*. Their memory is shared copy-on-write, not read-write shared like threads share memory. On Linux, glibc's `fork()` implementation uses `clone()`, but with different flags than for thread creation. So the multiple processes each have their own RNG state for `rand()`. Even if they were shared-memory threads, they each do their own malloc. — Peter Cordes, Sep 30 '15 at 04:16
Also note that the `fork()` loop doesn't check that it's the parent before going on to the next iteration. So you're actually spawning 2^5 (32) infinite-loop processes. Each trip through the loop is a doubling in the number of processes. — Peter Cordes, Sep 30 '15 at 04:26
@PeterCordes: The exponential part was intended, since my computer's pretty powerful. — Joshua Snider, Sep 30 '15 at 04:28

interjay · Accepted Answer · 2015-09-29T22:00:08.830

10

Because GCC isn't smart enough to perform this optimization on dynamically allocated memory. However, if you change garbageto be a local array instead, GCC compiles the loop to this:

.L4:
    call    rand
    call    rand
    call    rand
    jmp .L4

This just calls rand repeatedly (which is needed because the call has side effects), but optimizes out the reads and writes.

If GCC was even smarter, it could also optimize out the randcalls, because its side effects only affect any later randcalls, and in this case there aren't any. However, this sort of optimization would probably be a waste of compiler writers' time.

edited Sep 29 '15 at 22:00

answered Sep 29 '15 at 21:52

interjay

107,303
21
270
254

Nice observation. I tried it on [godbolt](https://goo.gl/bnzOg3), and found that only gcc, not clang or icc, is able to optimize away the array accesses (and the modulo 1000 on the output of `rand()`). Also that gcc can only spot it with `int garbage[1000]`, not `int garbage[len]`, even though `len` is known to be a compile-time constant. Even with `-fwhole-program`, all that gcc gains is not emitting a non-inline definition for `getRand()`. i.e. it can treat it as if it had been declared `static` or `inline`. – Peter Cordes Sep 30 '15 at 04:46

Alan Stokes · Answer 2 · 2015-09-29T22:06:03.813

5

It can't, in general, tell that rand() doesn't have observable side-effects here, and it isn't required to remove those calls.

It could remove the writes, but it may be the use of arrays is enough to suppress that.

The standard neither requires nor prohibits what it is doing. As long as the program has the correct observable behaviour any optimisation is purely a quality of implementation matter.

edited Sep 29 '15 at 22:06

answered Sep 29 '15 at 21:49

Alan Stokes

18,815
3
45
64

So, it could replace the final loop with `while(true) { rand();}`. but doesn't? – Joshua Snider Sep 29 '15 at 21:51
Yes, that's what I'm suggesting. – Alan Stokes Sep 29 '15 at 21:52
1

`rand()` is a standard library function, so the compiler knows exactly what it does (the standard defines it) – M.M Sep 29 '15 at 21:54
@M.M It could know, but is not required to make use of that knowledge while optimising. Corrected my overly constraining statement. – Alan Stokes Sep 29 '15 at 21:56
The standard doesn't require any optimizations to be made, but it permits them. – M.M Sep 29 '15 at 21:57
`rand` does have side effects. Although in this case they don't matter, as I point out in my answer. Theoretically the compiler could have optimized the calls out. – interjay Sep 29 '15 at 22:04
@interjay I suspect we're getting into tree falling in the forest with nobody to hear it territory. But I have hedged my statement slightly. – Alan Stokes Sep 29 '15 at 22:07

score 3 · Answer 3 · edited May 23 '17 at 11:44

3

This code causes undefined behaviour because it has an infinite loop with no observable behaviour. Therefore any result is permissible.

In C++14 the text is 1.10/27:

The implementation may assume that any thread will eventually do one of the following:

terminate,

make a call to a library I/O function,

access or modify a volatile object, or

perform a synchronization operation or an atomic operation.

[Note: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. —end note ]

I wouldn't say that rand() counts as an I/O function.

Related question

edited May 23 '17 at 11:44

Community

1
1

answered Sep 29 '15 at 21:53

M.M

138,810
21
208
365

Theoretically that could have been true (although very unlikely to be the reason here). But here you can easily change the loop to be finite and see that the result is the same. – interjay Sep 29 '15 at 21:54
@interjay if OP updates code to not have infinite loop and still show the problem, I'll remove this answer – M.M Sep 29 '15 at 21:55
I don't think a pedantic answer like this is any use here. Besides, I think realistically any compiler will allow infinite loops as many programs have them. – interjay Sep 29 '15 at 21:57
@interjay I linked an example where the compiler removes infinite loops. – M.M Sep 29 '15 at 21:59
One can be both pedantic and correct. And that rule really was added for the benefit of compiler writers because it is a useful optimisation. – Alan Stokes Sep 29 '15 at 22:01
This is really not a pedantry answer- this rule was literally added to change the behaviour of compiler optimisers, which is what's under question. – Puppy Sep 29 '15 at 22:11

score 0 · Answer 4 · answered Sep 30 '15 at 07:25

0

Leave it a chance to crash by array overflow ! The compiler won't speculate on the range of outputs of getRand.

answered Sep 30 '15 at 07:25

Why not? That modulo means we know its 0 <= getRand < max. – Joshua Snider Sep 30 '15 at 14:38
`rand` returns an `int`, so the modulo could be negative and you have to dig deeper ! I don't believe that any optimizer would play this game. – Sep 30 '15 at 14:43

Why doesn't my C++ compiler optimize these memory writes away?

4 Answers4