When can I confidently compile program with -O3?

Question

I've seen a lot of people complaining about -O3 option:

I check the manual from the GCC:

   -O3    Optimize yet more.  -O3 turns on all optimizations
          specified   by   -O2   and   also   turns  on  the
          -finline-functions and -frename-registers options.

And I've also confirmed the code to make sure that two options is the only two optimizations included with -O3 on:

if (optimize >= 3){
    flag_inline_functions = 1;
    flag_rename_registers = 1;
}

For those two optimizations:

-finline-functions is useful in some cases (mainly with C++) because it lets us define the size of inlined functions (600 by default) with -finline-limit. Compiler may report an error complaining about lack of memory when set a high inline-limit.
-frename-registers attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers.

For inline-functions, although it can reduce the numbers of function calls, but it may lead to a large binary files, so -finline-functions may introduce severe cache penalties and become even slower than -O2. I think the cache penalties not only depends on the program itself.

For rename-registers, I don't think it will have any positive impact on a cisc architecture like x86.

My question has 2.5 parts:

Am I right to claim that whether a program can run faster with -O3 option depends on the underlying platform/architecture? [Answered]

EDIT:

The 1st part has been confirmed as true. David Hammen also claim that we should be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers like Intel and AMD.
When can I confidently use -O3 option? I suppose these two optimizations especially the rename-registers may lead to a different behaviors from -O0/O2. I saw some programs compiled with -O3 got crashed during execution, is it deterministic? If I run an executable once without any crash, does it mean it is safe to use -O3?

EDIT: The deterministicity has nothing to do with the optimization, it is a multithreading problem. However, for a multithread program, it is not safe to use -O3 when we run an executable once without errors. David Hammen shows that -O3 optimization on floating point operations may violate the strict weak ordering criterion for a comparison. Is there any other concern we need to take care when we want to use -O3 option?
If the answer of the 1st question is "yes", then when I change the target platform or in a distributed system with different machines, I may need to change between -O3 and -O2. Is there any general ways to decide whether I can get a performance improvement with -O3? For example, more registers, short inline functions, etc. [Answered]

EDIT: The 3rd part has been answered by Louen as "the variety of platforms make general reasoning about this problem impossible" When evaluating the performance gain by -O3, we have to try it with both and benchmark our code to see which is faster.

*Bug occurs only when compiling with -O3*. The bug is there all the time, it just happens to be hidden sometimes. — Bo Persson, Feb 13 '13 at 09:57
See [What is the difference between an error, a fault, and a failure](http://www.quora.com/In-Software-Testing-what-is-the-difference-between-an-error-a-fault-and-a-failure). When you have a fault in your code, compiling with differing optimisations can lead to an error appearing or not at runtime. — Peter Wood, Feb 13 '13 at 10:09
_Bug occurs only with -O3_ is very similar to _bug only occurs in debugger_. It is foolish to blame `-O3` for the bugs. The compiler is still required to work in a conforming way regardless of optimizations (exception to that being options like `-ffast-math`, which however explicitly say that they will generate non-conforming code). So, if someone's code doesn't work, it's simply buggy code, no more and no less. There is nothing wrong with using `-O3`. — Damon, Feb 13 '13 at 11:06
Hmm, failing with `-O3` while working with `-O2` rather sounds like you are invoking UB and the compiler is completely free to do whatever it pleases. It's waaay easier to invoke UB compared to such a major feature of such a major and tested compiler to be broken. — Christian Rau, Feb 13 '13 at 11:36
in fact register renaming is most effective in a lack-of-register architecture like x86. Modern x86 CPUs are all RISC inside with hundreds of physical registers. CISC instructions will be converted to RISC macro-ops underneath. With the large number of renamed registers, instructions parallelization can be done effectively, maximize execution speed. — phuclv, Aug 17 '13 at 14:25
"For these "decoupled" superscalar x86 processors, register renaming is absolutely critical due to the meager 8 registers of the x86 architecture in 32-bit mode (64-bit mode added another 8 registers). This differs strongly from the RISC architectures, where providing more registers via renaming only has a minor effect." http://www.lighterra.com/papers/modernmicroprocessors/ — phuclv, Aug 17 '13 at 15:20

score 7 · Accepted Answer · edited May 23 '17 at 11:47

I saw some programs got crashed when compiling with -O3, is it deterministic?

If the program is single threaded, all algorithms used by program are deterministic, and if the inputs from run to run are identical, yes. The answer is "not necessarily" if any of those conditions is not true.

The same applies if you compile without using -O3.

If I run an executable once without any crash, does it mean it is safe to use -O3?

Of course not. Once again, the same applies if you compile without using -O3. Just because your application runs once does not mean it will run successfully in all cases. That's part of what makes testing a hard problem.

Floating point operations can result in weird behaviors on machines in which the floating point registers have greater precision than do doubles. For example,

void add (double a, double b, double & result) {
   double temp = a + b;
   result = temp;
   if (result != temp) {
      throw FunkyAdditionError (temp);
   }
}

Compile a program that uses this add function unoptimized and you probably will never see any FunkyAdditionError exceptions. Compile optimized and certain inputs will suddenly start resulting in these exceptions. The problem is that with optimization, the compiler will make temp a register while result, being a reference, won't be compiled away into a register. Add an inline qualifier and those exceptions may disappear when your compiler is compiled with -O3 because now result can also be a register. Optimization with regard to floating point operations can be a tricky subject.

Finally, let's look at one of those cases where things did go bump in the night when a program was compiled with -O3, GCC: program doesn't work with compilation option -O3. The problem only occurred with -O3 because the compiler probably inlined the distance function but kept one (but not both) of the results in an extended precision floating point register. With this optimization, certain points p1 and p2 can result in both p1<p2 and p2<p1 evaluating to true. This violates the strict weak ordering criterion for a comparison function.

You need to be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers (e.g., Intel and AMD).

+1 Floating point is a perfect example for different behaviour of optimized code. — Christian Rau, Feb 13 '13 at 13:18

score 4 · Answer 2 · answered Feb 13 '13 at 10:49

1) and 3) You are right. Some programs can benefit from the optimizations enabled by -O3 and some won't. For example, inlining more functions is usually better (because it bypasses the function call mechanism overhead) but sometimes it can make things slower (by impairing cache locality for example). That and the variety of platforms make general reasoning about this problem impossible.

So to make things short, the only valid answer is : try it with both and benchmark your code to see which is faster.

2) Under the hypothesis that you are not hitting any compiler/optimizer bug (they are rare, but they exist), then it is reasonable to assume that an error in your program that only reveals itself at -O3 only, then it has probably been there all the time, only the -O3 option uncovered it.

So for the 2nd one, if I run an executable compiled with -O3 once without any crash, does it mean it is safe to use -O3? — StarPinkER, Feb 13 '13 at 12:03
@JermaineXu: No. Not at all. The most obvious example would be a division by zero, depending on user input. — MSalters, Feb 13 '13 at 12:06

When can I confidently compile program with -O3?

2 Answers2

Linked