GCC optimization levels. Which is better?

Question

I am focusing on the CPU/memory consumption of compiled programs by GCC.

Executing code compiled with O3 is it always so greedy in term of resources ?

Is there any scientific reference or specification that shows the difference of Mem/cpu consumption of different levels?

People working on this problem often focus on the impact of these optimizations on the execution time, compiled code size, energy. However, I can't find too much work talking about resource consumption (by enabling optimizations).

Thanks in advance.

People working (and publishing research papers) on optimization don't focus on compilation time. They focus on the execution time of the compiled & optimized code. — Basile Starynkevitch, Oct 05 '15 at 04:40
What kind of code do you have? Scientific HPC numerical code? — Basile Starynkevitch, Oct 05 '15 at 04:45
"Is there any scientific reference or specification that shows the difference of Mem/cpu consumption of different levels?" My advice is to test it on your own code and see what happens. As @BasileStarynkevitch states 03 optimizations doesn't necessary mean that the performance will be better than 02. 03 is known to have issues with concurrency (if done incorrectly) due to the `-fpredictive-commoning` flag. Also the higher the optimization the stricter GCC follows the rules and it's easier to expose critical bugs if the code relies in a way on undefined behaviour. — rbaleksandar, Oct 05 '15 at 04:50
The problem I see with GCC and some other compilers is that they are strictly dictated by the source code. A TI compiler that I know of can generate human-readable feed back messages on the limitation factor of the performance, like register or ALU usage, ill structured loop control, and assist programmer adjust source accordingly. Optimizing optimization settings is not true optimization. Optimization is all about finding what takes too much time and do something about it. — user3528438, Oct 05 '15 at 04:55
@user3528438: GCC does also *sometimes* generate feedback messages on optimization limitations. — Basile Starynkevitch, Oct 05 '15 at 05:14
there are no guarantees that -O3 produces a faster program than any other optimization. often one would hope but, it depends heavily on your code and your hardware in combination with the compiler and optimizer. even simple things like adding a nop or few in the bootstrap causing code that follows to change alignment within cache lines can have double digit percentage of performance changes. I guess what we can say is that it is definitely possible to create cases where -O3 is slower than others and where -O3 is faster than others (and cases where it is the same as at least one other) — old_timer, Oct 05 '15 at 20:59
@dwelch I am focusing on CPU/Mem consumption among different optimization levels. There is no quantitative study on that hot topic — staticx, Oct 09 '15 at 12:31
I still don't see your point. In what country do you live? Do you have resources (e.g. half a million euros or dollars) for your goals? If you live in Europe, perhaps your ideas or wishes could could be part of some collaborative R&D project (e.g. H2020, ITEA, ....). If so, please contact me by email... — Basile Starynkevitch, Oct 09 '15 at 12:48
Usually, memory consumption does not depend much on optimization level, and cpu use is almost the same as running time. What makes you say -O3 causes applications to become more greedy? — Marc Glisse, Oct 09 '15 at 12:52
@BasileStarynkevitch I want just to study the impact of GCC optimizations options on resource consumption. — staticx, Oct 09 '15 at 12:57
@MarcGlisse I don't agree with you. According to my initial experiment there is a difference between O0 and O3 in term of memory consumption. For CPU, the same thing. O3 is almost more greedy than other levels. But I have considered only 2 or 3 programs. I have to benchmark to validate these assumptions. — staticx, Oct 09 '15 at 13:00
@BasileStarynkevitch Thank you but they work to improve the execution time, code size, or compilation time of specific programs on different architectures. Not on resources as I can see — staticx, Oct 09 '15 at 13:03
But you don't define what is memory consumption: dynamic stack size, heap requirement, cache misses, code size? — Basile Starynkevitch, Oct 09 '15 at 13:08
my answer applies to memory consumption and cpu utilization as well. It is easy to demonstrate both directions, high optimization using a more cpu/mem than a low optimization, and a low optimization using more cpu/mem than high optimization. optimization is only one factor, the code, and the hardware are other factors they act together you cannot separate one (optimization) and find any interesting results from it. it is the classic benchmark problem. you can make a benchmark show what you want it to show. — old_timer, Oct 09 '15 at 13:26
Maybe if you showed those examples where O3 is "more greedy" and how you measured it, people would understand what you are talking about... — Marc Glisse, Oct 09 '15 at 16:26

Basile Starynkevitch · Answer 1 · 2018-06-20T08:27:05.250

No, there is no absolute way, because optimization in compilers is an art (and is even not well defined, and might be undecidable or intractable).

But some guidelines first:

be sure that your program is correct and has no bugs before optimizing anything, so do debug and test your program
have well designed test cases and representative benchmarks (see this).
be sure that your program has no undefined behavior (and this is tricky, see this), since GCC will optimize strangely (but very often correctly, according to C99 or C11 standards) if you have UB in your code; use the -fsanitize=style options (and gdb and valgrind ....) during debugging phase.
profile your code (on various benchmarks), in particular to find out what parts are worth optimization efforts; often (but not always) most of the CPU time happens in a small fraction of the code (rule of thumb: 80% of time spent in 20% of code; on some applications like the gcc compiler this is not true, check with gcc -ftime-report to ask gcc to show time spent in various compiler modules).... Most of the time "premature optimization is the root of all evil" (but there are exceptions to this aphorism).
improve your source code (e.g. use carefully and correctly restrict and const, add some pragmas or function or variable attributes, perhaps use wisely some GCC builtins __builtin_expect, __builtin_prefetch -see this-, __builtin_unreachable...)
use a recent compiler. Current version (october 2015) of GCC is 5.2 (and GCC 8 in june 2018) and continuous progress on optimization is made ; you might consider compiling GCC from its source code to have a recent version.
enable all warnings (gcc -Wall -Wextra) in the compiler, and try hard to avoid all of them; some warnings may appear only when you ask for optimization (e.g. with -O2)
Usually, compile with -O2 -march=native (or perhaps -mtune=native, I assume that you are not cross-compiling, if you do add the good -march option ...) and benchmark your program with that
Consider link-time optimization by compiling and linking with -flto and the same optimization flags. E.g., put CC= gcc -flto -O2 -march=native in your Makefile (then remove -O2 -mtune=native from your CFLAGS there)...
Try also -O3 -march=native, usually (but not always, you might sometimes has slightly faster code with -O2 than with -O3 but this is uncommon) you might get a tiny improvement over -O2
If you want to optimize the generated program size, use -Os instead of -O2 or -O3; more generally, don't forget to read the section Options That Control Optimization of the documentation. I guess that both -O2 and -Os would optimize the stack usage (which is very related to memory consumption). And some GCC optimizations are able to avoid malloc (which is related to heap memory consumption).
you might consider profile-guided optimizations, -fprofile-generate, -fprofile-use, -fauto-profile options
dive into the documentation of GCC, it has numerous optimization & code generation arguments (e.g. -ffast-math, -Ofast ...) and parameters and you could spend months trying some more of them; beware that some of them are not strictly C standard conforming!
recent GCC and Clang can emit DWARF debug information (somehow "approximate" if strong optimizations have been applied) even when optimizing, so passing both -O2 and -g could be worthwhile (you still would be able, with some pain, to use the gdb debugger on optimized executable)
if you have a lot of time to spend (weeks or months), you might customize GCC using MELT (or some other plugin) to add your own new (application-specific) optimization passes; but this is difficult (you'll need to understand GCC internal representations and organization) and probably rarely worthwhile, except in very specific cases (those when you can justify spending months of your time for improving optimization)
you might want to understand the stack usage of your program, so use -fstack-usage
you might want to understand the emitted assembler code, use -S -fverbose-asm in addition of optimization flags (and look into the produced .s assembler file)
you might want to understand the internal working of GCC, use various -fdump-* flags (you'll get hundred of dump files!).

Of course the above todo list should be used in an iterative and agile fashion.

For memory leaks bugs, consider valgrind and several -fsanitize= debugging options. Read also about garbage collection (and the GC handbook), notably Boehm's conservative garbage collector, and about compile-time garbage collection techniques.

Read about the MILEPOST project in GCC.

Consider also OpenMP, OpenCL, MPI, multi-threading, etc... Notice that parallelization is a difficult art.

Notice that even GCC developers are often unable to predict the effect (on CPU time of the produced binary) of such and such optimization. Somehow optimization is a black art.

Perhaps gcc-help@gcc.gnu.org might be a good place to ask more specific & precise and focused questions about optimizations in GCC

You could also contact me on basileatstarynkevitchdotnet with a more focused question... (and mention the URL of your original question)

For scientific papers on optimizations, you'll find lots of them. Start with ACM TOPLAS, ACM TACO etc... Search for iterative compiler optimization etc.... And define better what resources you want to optimize for (memory consumption means next to nothing....).

(on x86) -mtune affects the time gcc thinks each instruction takes. -march affects the list of instructions available (like avx2). Unless you override it, -march implies the equivalent -mtune. — Marc Glisse, Oct 05 '15 at 08:08
@BasileStarynkevitch How about resource consumption ? I mean CPU and Memory consumption? I there any premature studies on this topic? If for example, I would like to test 00-03 in term of resource consumption, is there any trade-off on that? As well, is there any attempt to cluster optimization options regarding resources usage? — staticx, Oct 09 '15 at 12:38
@staticx: you should edit your question to improve it. Are you talking of CPU & memory consumption inside the [GCC](http://gcc.gnu.org/) compiler during the compilation, or of CPU & memory consumption inside the compiled application? — Basile Starynkevitch, Oct 09 '15 at 12:40
@BasileStarynkevitch Sorry. I updated the question. You can check it. I am checking the CPU/mem consumption of compiled applications — staticx, Oct 09 '15 at 12:46

GCC optimization levels. Which is better?

1 Answers1

Linked

Related