135

How many GCC optimization levels are there?

I tried gcc -O1, gcc -O2, gcc -O3, and gcc -O4

If I use a really large number, it won't work.

However, I have tried

gcc -O100

and it compiled.

How many optimization levels are there?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
neuromancer
  • 53,769
  • 78
  • 166
  • 223
  • 13
    @minitech Which FM are you looking at? Even with `man gcc` on Cygwin (12000 odd lines) you can search for `-O` and find everything the answers below state, and then some. – Jens Jul 25 '12 at 13:32
  • 1
    @minmaxavg after reading the source, I disagree with you: anything larger than `3` is the same as `3` (as long as it does not `int` overflow). See [my answer](http://stackoverflow.com/a/30308151/895245). – Ciro Santilli OurBigBook.com May 18 '15 at 16:17
  • 2
    Actually, GCC has many other flags to fine tune optimizations. `-fomit-stack-pointer` will change the generated code. – Basile Starynkevitch Jun 25 '15 at 16:27

4 Answers4

201

To be pedantic, there are 8 different valid -O options you can give to gcc, though there are some that mean the same thing.

The original version of this answer stated there were 7 options. GCC has since added -Og to bring the total to 8.

From the man page:

  • -O (Same as -O1)
  • -O0 (do no optimization, the default if no optimization level is specified)
  • -O1 (optimize minimally)
  • -O2 (optimize more)
  • -O3 (optimize even more)
  • -Ofast (optimize very aggressively to the point of breaking standard compliance)
  • -Og (Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.)
  • -Os (Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version)

There may also be platform specific optimizations, as @pauldoo notes, OS X has -Oz.

SuperStormer
  • 4,997
  • 5
  • 25
  • 35
Glen
  • 21,816
  • 3
  • 61
  • 76
  • 31
    If you're developing on Mac OS X there's an additional `-Oz` setting which is "optimize for size more aggressively than `-Os`": http://developer.apple.com/mac/library/DOCUMENTATION/DeveloperTools/gcc-4.0.1/gcc/Optimize-Options.html – pauldoo May 05 '10 at 10:54
  • 9
    Note : O3 is not necessarily better than O2 even if the name suggest so. Try both. – johan d Sep 19 '13 at 14:04
  • 2
    @pauldoo 404 page, replace with archive.org – noɥʇʎԀʎzɐɹƆ Jul 06 '16 at 16:58
  • 1
    @pauldoo working link https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Optimize-Options.html – Max MacLeod Jul 09 '21 at 12:53
  • Calling "Os" optimize for size is IMO misleading since it is still optimising primarily for speed, but it just skips or alters certain optimisations that may otherwise lead to code size increasing. You did explain this well enough in your text, just pointing out a peeve I have in general by saying it means "optimize for size" implying that is the opposite of optimizing for speed. "O0" should never be used, as it generates ridiculous code like something from a 1970s compiler, and pretty much any remaining reason to use it is gone now that "Og" exists – thomasrutter Aug 11 '21 at 07:08
  • Also lending my support to the comment that said O3 is not necessarily always better than O2, or they wouldn't have separated the two. O3 contains some optimizations that not everyone would want, which could increase memory use of a program and in some narrow cases can slow things down. It also makes compilation slower and use more memory. – thomasrutter Aug 11 '21 at 07:14
  • @thomasrutter: At least in clang, -Og performs optimizations which would likely be regarded as "astonishing", such as arbitrarily corrupting memory if code receives input that would cause execution to get stuck in an endless loop. *Cleanly* skipping a loop which can be shown not to affect the behavior of any code that could be reached afterward is a useful optimization, but clang will combine optimizations that rely upon conditions that would need to exist for a loop to exit with optimizations that eliminate the loop, without recognizing that an omitted loop won't establish post-conditions. – supercat Feb 28 '22 at 18:57
  • please add the (version independent) link https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html – MattTT Apr 14 '23 at 09:22
73

Let's interpret the source code of GCC 5.1

We'll try to understand what happens on -O100, since it is not clear on the man page.

We shall conclude that:

  • anything above -O3 up to INT_MAX is the same as -O3, but that could easily change in the future, so don't rely on it.
  • GCC 5.1 runs undefined behavior if you enter integers larger than INT_MAX.
  • the argument can only have digits, or it fails gracefully. In particular, this excludes negative integers like -O-1

Focus on subprograms

First remember that GCC is just a front-end for cpp, as, cc1, collect2. A quick ./XXX --help says that only collect2 and cc1 take -O, so let's focus on them.

And:

gcc -v -O100 main.c |& grep 100

gives:

COLLECT_GCC_OPTIONS='-O100' '-v' '-mtune=generic' '-march=x86-64'
/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/cc1 [[noise]] hello_world.c -O100 -o /tmp/ccetECB5.

so -O was forwarded to both cc1 and collect2.

O in common.opt

common.opt is a GCC specific CLI option description format described in the internals documentation and translated to C by opth-gen.awk and optc-gen.awk.

It contains the following interesting lines:

O
Common JoinedOrMissing Optimization
-O<number>  Set optimization level to <number>

Os
Common Optimization
Optimize for space rather than speed

Ofast
Common Optimization
Optimize for speed disregarding exact standards compliance

Og
Common Optimization
Optimize for debugging experience rather than speed or size

which specify all the O options. Note how -O<n> is in a separate family from the other Os, Ofast and Og.

When we build, this generates a options.h file that contains:

OPT_O = 139,                               /* -O */
OPT_Ofast = 140,                           /* -Ofast */
OPT_Og = 141,                              /* -Og */
OPT_Os = 142,                              /* -Os */

As a bonus, while we are grepping for \bO\n inside common.opt we notice the lines:

-optimize
Common Alias(O)

which teaches us that --optimize (double dash because it starts with a dash -optimize on the .opt file) is an undocumented alias for -O which can be used as --optimize=3!

Where OPT_O is used

Now we grep:

git grep -E '\bOPT_O\b'

which points us to two files:

Let's first track down opts.c

opts.c:default_options_optimization

All opts.c usages happen inside: default_options_optimization.

We grep backtrack to see who calls this function, and we see that the only code path is:

  • main.c:main
  • toplev.c:toplev::main
  • opts-global.c:decode_opts
  • opts.c:default_options_optimization

and main.c is the entry point of cc1. Good!

The first part of this function:

  • does integral_argument which calls atoi on the string corresponding to OPT_O to parse the input argument
  • stores the value inside opts->x_optimize where opts is a struct gcc_opts.

struct gcc_opts

After grepping in vain, we notice that this struct is also generated at options.h:

struct gcc_options {
    int x_optimize;
    [...]
}

where x_optimize comes from the lines:

Variable
int optimize

present in common.opt, and that options.c:

struct gcc_options global_options;

so we guess that this is what contains the entire configuration global state, and int x_optimize is the optimization value.

255 is an internal maximum

in opts.c:integral_argument, atoi is applied to the input argument, so INT_MAX is an upper bound. And if you put anything larger, it seem that GCC runs C undefined behaviour. Ouch?

integral_argument also thinly wraps atoi and rejects the argument if any character is not a digit. So negative values fail gracefully.

Back to opts.c:default_options_optimization, we see the line:

if ((unsigned int) opts->x_optimize > 255)
  opts->x_optimize = 255;

so that the optimization level is truncated to 255. While reading opth-gen.awk I had come across:

# All of the optimization switches gathered together so they can be saved and restored.
# This will allow attribute((cold)) to turn on space optimization.

and on the generated options.h:

struct GTY(()) cl_optimization
{
  unsigned char x_optimize;

which explains why the truncation: the options must also be forwarded to cl_optimization, which uses a char to save space. So 255 is an internal maximum actually.

opts.c:maybe_default_options

Back to opts.c:default_options_optimization, we come across maybe_default_options which sounds interesting. We enter it, and then maybe_default_option where we reach a big switch:

switch (default_opt->levels)
  {

  [...]

  case OPT_LEVELS_1_PLUS:
    enabled = (level >= 1);
    break;

  [...]

  case OPT_LEVELS_3_PLUS:
    enabled = (level >= 3);
    break;

There are no >= 4 checks, which indicates that 3 is the largest possible.

Then we search for the definition of OPT_LEVELS_3_PLUS in common-target.h:

enum opt_levels
{
  OPT_LEVELS_NONE, /* No levels (mark end of array).  */
  OPT_LEVELS_ALL, /* All levels (used by targets to disable options
                     enabled in target-independent code).  */
  OPT_LEVELS_0_ONLY, /* -O0 only.  */
  OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og.  */
  OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og.  */
  OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og.  */
  OPT_LEVELS_2_PLUS, /* -O2 and above, including -Os.  */
  OPT_LEVELS_2_PLUS_SPEED_ONLY, /* -O2 and above, but not -Os or -Og.  */
  OPT_LEVELS_3_PLUS, /* -O3 and above.  */
  OPT_LEVELS_3_PLUS_AND_SIZE, /* -O3 and above and -Os.  */
  OPT_LEVELS_SIZE, /* -Os only.  */
  OPT_LEVELS_FAST /* -Ofast only.  */
};

Ha! This is a strong indicator that there are only 3 levels.

opts.c:default_options_table

opt_levels is so interesting, that we grep OPT_LEVELS_3_PLUS, and come across opts.c:default_options_table:

static const struct default_options default_options_table[] = {
    /* -O1 optimizations.  */
    { OPT_LEVELS_1_PLUS, OPT_fdefer_pop, NULL, 1 },
    [...]

    /* -O3 optimizations.  */
    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
    [...]
}

so this is where the -On to specific optimization mapping mentioned in the docs is encoded. Nice!

Assure that there are no more uses for x_optimize

The main usage of x_optimize was to set other specific optimization options like -fdefer_pop as documented on the man page. Are there any more?

We grep, and find a few more. The number is small, and upon manual inspection we see that every usage only does at most a x_optimize >= 3, so our conclusion holds.

lto-wrapper.c

Now we go for the second occurrence of OPT_O, which was in lto-wrapper.c.

LTO means Link Time Optimization, which as the name suggests is going to need an -O option, and will be linked to collec2 (which is basically a linker).

In fact, the first line of lto-wrapper.c says:

/* Wrapper to call lto.  Used by collect2 and the linker plugin.

In this file, the OPT_O occurrences seems to only normalize the value of O to pass it forward, so we should be fine.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
44

Seven distinct levels:

  • -O0 (default): No optimization.

  • -O or -O1 (same thing): Optimize, but do not spend too much time.

  • -O2: Optimize more aggressively

  • -O3: Optimize most aggressively

  • -Ofast: Equivalent to -O3 -ffast-math. -ffast-math triggers non-standards-compliant floating point optimizations. This allows the compiler to pretend that floating point numbers are infinitely precise, and that algebra on them follows the standard rules of real number algebra. It also tells the compiler to tell the hardware to flush denormals to zero and treat denormals as zero, at least on some processors, including x86 and x86-64. Denormals trigger a slow path on many FPUs, and so treating them as zero (which does not trigger the slow path) can be a big performance win.

  • -Os: Optimize for code size. This can actually improve speed in some cases, due to better I-cache behavior.

  • -Og: Optimize, but do not interfere with debugging. This enables non-embarrassing performance for debug builds and is intended to replace -O0 for debug builds.

There are also other options that are not enabled by any of these, and must be enabled separately. It is also possible to use an optimization option, but disable specific flags enabled by this optimization.

For more information, see GCC website.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Demi
  • 3,535
  • 5
  • 29
  • 45
3

Four (0-3): See the GCC 4.4.2 manual. Anything higher is just -O3, but at some point you will overflow the variable size limit.

Tom
  • 20,852
  • 4
  • 42
  • 54
  • I have explored the source code [in my answer](http://stackoverflow.com/a/30308151/895245) and agree with you. More pedantically, GCC seems to rely on `atoi` undefined behavior, followed by a `255` internal limit. – Ciro Santilli OurBigBook.com May 18 '15 at 16:16
  • 7
    Please consider removing your answer, as it is (at least these days) incorrect. – einpoklum Feb 27 '16 at 09:25