7

Is there a method to automatically find the best compiler options (on a given machine), which result in the fastest possible executable?

Naturally, I use g++ -O3, but there are additional flags that may make the code run faster, e.g. -ffast-math and others, some of which are hardware-dependent.

Does anyone know some code I can put in my configure.ac file (GNU autotools), so that the flags will be added to the Makefile automatically by the ./configure command?

In addition to automatically determining the best flags, I would be interested in some useful compiler flags that are good to use as a default for most optimized executables.

Update: Most people suggest to just try different flags and select the best ones empirically. For that method, I'd have a follow-up question: Is there a utility that lists all compiler flags that are possible for the machine I'm running on (e.g. tests if SSE instructions are available etc.)?

Frank
  • 64,140
  • 93
  • 237
  • 324
  • 5
    The "best" optimisation options depend on what your code actually does. Only you know that. –  Mar 14 '10 at 18:37
  • 1
    And to make it 'go to eleven' you should profile it. There are few free lunches left in terms of compiler toggles. – Dirk Eddelbuettel Mar 14 '10 at 18:39
  • 1
    Ok, to really tailor it to my code I should hand-select options and profile them. But it can't hurt to add the appropriate `-march=cpu-type` on that machine? And there should be certain categories of programs that profit from certain other (hardware-dependent) compilation flags? For example, my program falls into the category "uses lots of floating-point operations". – Frank Mar 14 '10 at 18:48
  • 1
    Also, it seems useful to use `-mfpmath=sse` wherever possible. GCC documentation says: "The resulting code should be considerably faster in the majority of cases" (http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/i386-and-x86_002d64-Options.html). – Frank Mar 14 '10 at 18:55
  • 1
    Why not use -march=native? That should enable SSEn floating-point, among other things. – Ben Voigt Mar 14 '10 at 20:53

8 Answers8

4

I don't think you can do this at configure-time, but there is at least one program which attempts to optimize gcc option flags given a particular executable and machine. See http://www.coyotegulch.com/products/acovea/ for example.

You might be able to use this with some knowledge of your target machine(s) to find a good set of options for your code.

ergosys
  • 47,835
  • 5
  • 49
  • 70
  • Ditto for ATLAS (Automatically Tuned Linear Algebra Software), an implementation of BLAS/LAPACK. See http://math-atlas.sourceforge.net/ – celion Mar 15 '10 at 07:08
  • The link to acovea is broken. Here is the alternative:http://stderr.org/doc/acovea/html/acoveaga.html – OutputLogic Jun 16 '11 at 06:18
4

Um - yes. This is possible. Look into profile-guided optimization.

2

some compilers provide "-fast" option to automatically select most aggressive optimization for given compilation host. http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler

Unfortunately, g++ does not provide similar flags.

as a follow-up to your next question, for g++ you can use -mtune option together with -O3 which will give you reasonably fast defaults. Challenge then is to find processor type of your compilation host. you may want to look on autoconf macro archive, to see somebody wrote necessary tests. otherwise, assuming linux, you have to parse /proc/cpuinfo to get processor type

Anycorn
  • 50,217
  • 42
  • 167
  • 261
2

After some googling, I found this script: gcccpuopt.

On one of my machines (32bit), it outputs:

-march=pentium4 -mfpmath=sse

On another machine (64bit) it outputs:

$ ./gcccpuopt 
Warning: The optimum *32 bit* architecture is reported
-m32 -march=core2 -mfpmath=sse

So, it's not perfect, but might be helpful.

Frank
  • 64,140
  • 93
  • 237
  • 324
2

See also -mcpu=native/-mtune=native gcc options.

wRAR
  • 25,009
  • 4
  • 84
  • 97
1

Is there a method to automatically find the best compiler options (on a given machine), which result in the fastest possible executable?

No.

You could compile your program with a large assortment of compiler options, then benchmark each and every version, then select the one that is "fastest," but that's hardly reliable and probably not useful for your program.

greyfade
  • 24,948
  • 7
  • 64
  • 80
  • 1
    Which, BTW, is precisely what Acovea (mentioned by @ergosys) does: compile and benchmark the program hundreds, even thousands of times (which is why the program has to be simple and the benchmarks short) with different combinations of GCC optimization flags and "evolve" a good set of flags using a genetic algorithm. – Jörg W Mittag Mar 14 '10 at 19:53
0

This is a solution that works for me, but it does take a little while to set up. In "Python Scripting for Computational Science" by Hans Petter Langtangen (an excellent book in my opinion), an example is given of using a short python script to do numerical experiments to determine the best compiler options for your C/Fortran/... program. This is described in Chapter 1.1.11 on "Nested Heterogeneous Data Structures".

Source code for examples from the book are freely available at http://folk.uio.no/hpl/scripting/index.html (I'm not sure of the license, so will not reproduce any code here), and in particular you can find code for a similar numerical test in the code in TCSE3-3rd-examples.tar.gz in the file src/app/wavesim2D/F77/compile.py , which you could use as a base for writing a script which is appropriate for a particular system/language (C++ in your case).

Nathan
  • 436
  • 4
  • 5
-2

Optimizing your app is mainly your job, not the compiler's.

Here's an example of what I'm talking about.

Once you've done that, IF your app is compute-bound, with hotspots in your code (not in library code) THEN the compiler optimizations for speed will make some difference, so you can try different flag combinations.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135