Which gcc optimization flags should I use?

Question

If I want to minimize the time my c programs run, what optimization flags should I use (I want to keep it standard too)

Currently I'm using:

 -Wall -Wextra -pedantic -ansi -O3

Should I also use

-std=c99

for example?

And is there I specific order I should put those flags on my makefile? Does it make any difference?

And also, is there any reason not to use all the optimization flags I can find? do they ever counter eachother or something like that?

`-ansi` specifies the C89/C90 standard; `-std=c99` specifies the 1999 standard (and `-std=c11` specifies the current 2011 standard). It doesn't make sense to use them together. — Keith Thompson, Apr 13 '16 at 17:45
Optimization is compiler-specific. You appear to be using GCC-style compiler options, but other compilers accept similar flags. Tag this question with the C compiler you are using if you want meaningful answers about flags influencing optimization for your C implementation. — John Bollinger, Apr 13 '16 at 17:45
In case you use GCC compiler, [this answer](http://stackoverflow.com/a/1778700/6166067) might be useful. — aprelev, Apr 13 '16 at 18:43

score 4 · Answer 1 · answered Apr 13 '16 at 17:41

4

The flag -std=c99 does not change the optimization levels. It only changes what target language standard you want the compiler to confirm to.

You use -std=c99 when you want your program to be treated as a C99 program by the compiler.

answered Apr 13 '16 at 17:41

R Sahu

204,454
14
159
270

Peter Cordes · Accepted Answer · 2016-04-13T22:30:48.707

I'd recommend compiling new code with -std=gnu11, or -std=c11 if needed. Silencing all -Wall warnings is usually a good idea, IIRC. -Wextra warns for some things you might not want to change.

A good way to check how something compiles is to look at the compiler asm output. http://gcc.godbolt.org/ formats the asm output nicely (stripping out the noise). Putting some key functions up there and looking at what different compiler versions do is useful if you understand asm at all.

Use a new compiler version. gcc and clang have both improved significantly in newer versions. gcc 5.3 and clang 3.8 are the current releases. gcc5 makes noticeably better code than gcc 4.9.3 in some cases.

If you only need the binary to run on your own machine, you should use -O3 -march=native.

If you need the binary to run on other machines, choose the baseline for instruction-set extensions with stuff like -mssse3 -mpopcnt. You can use -mtune=haswell to optimize for Haswell even while making code that still runs on older CPUs (as determined by -march).

If your program doesn't depend on strict FP rounding behaviour, use -ffast-math. If it does, you can usually still use -fno-math-errno and stuff like that, without enabling -funsafe-math-optimizations. Some FP code can get big speedups from fast-math, like auto-vectorization.

If you can usefully do a test-run of your program that exercises most of the code paths that need to be optimized for a real run, then use profile-directed optimization:

gcc  -fprofile-generate -Wall -Wextra -std=gnu11 -O3 -ffast-math -march=native -fwhole-program *.c -o my_program
./my_program -option1 < test_input1
./my_program -option2 < test_input2
gcc  -fprofile-use      -Wall -Wextra -std=gnu11 -O3 -ffast-math -march=native -fwhole-program *.c -o my_program

-fprofile-use enables -funroll-loops, since it has enough information to decide when to actually unroll. Unrolling loops all over the place can make things worse. However, it's worth trying -funroll-loops to see if it helps.

If your test runs don't cover all the code paths, then some important ones will be marked as "cold" and optimized less.

-O3 enables auto-vectorization, which -O2 doesn't. This can give big speedups

-fwhole-program allows cross-file inlining, but only works when you put all the source files on one gcc command-line. -flto is another way to get the same effect. (Link-Time Optimization). clang supports -flto but not -fwhole-program.

-fomit-frame-pointer has been the default for a while now for x86-64, and more recently for x86 (32bit).

As well as gcc, try compiling your program with clang. Clang sometimes makes better code than gcc, sometimes worse. Try both and benchmark.

Thank you, your answer is very helpfull and just what I needed, can you please just explain or direct me to what you mean by "FP rounding" or "FP code"? — sharp_c-tudent, Apr 14 '16 at 00:03

score 2 · Answer 3 · answered Apr 13 '16 at 17:43

2

The only flag that has to do with optimization among those you specified is -O3. Others serve for other purposes.

You may want to add -funroll-loops and -fomit-frame-pointer, but they should be already included in -O3.

answered Apr 13 '16 at 17:43

ForceBru

43,482
10
63
98

1

Just a note, neither `-O3` nor any other optimisation level option do not unroll loops as `-funroll-loops` does. It is only enabled with `-fprofile-use`, because then it has enough information to decide which loops to unroll. Partial loop unrolling depends on the compiler though. – aprelev Apr 13 '16 at 18:40

Which gcc optimization flags should I use?

3 Answers3

Linked