1

I'm developing a controller program used to run a humanoid kidsize robot. The OS is debian 6 and whole programs are written in C++11. CPU is a 1GHz VorteX86 SD and its architecture is Intel i486.

I need to compile my code with maximum possible optimization. currently I'm using gcc with 3rd level optimization flag and i486 optimization tunning:

g++ -std=c++0x -O3 -march=i486 -mtunes=i486

I'm wondering if its possible to gain more optimized code or not. I searched around about optimization flags and compiler benchmarks, but didn't find any...

My question is which compiler for C++ is generates faster code? Specially for i486 architecture.

Current candidates are: ICC XE, GCC 4.6, EkoPath

Xeo
  • 129,499
  • 52
  • 291
  • 397
sorush-r
  • 10,490
  • 17
  • 89
  • 173
  • Why do you need faster code? Are you trying to implement real-time control? Normally a microcontroller with deterministic behavior is used for something like that, but if you want to use Linux, [RTwiki](https://rt.wiki.kernel.org/articles/f/r/e/Frequently_Asked_Questions_7407.html) might be a good source. – Justin Mar 22 '12 at 11:51
  • Yes. I'm using Xenomai rt kernel, though image processing algorithms and decision mechanism should be really fast. – sorush-r Mar 22 '12 at 11:59
  • Have you tried any? Given such a specific set of requirements, your best bet is probably to set up a suitable test harness and benchmark each of the options. – Jon Cage Mar 22 '12 at 11:49
  • Before paying for ICC, I would like to see some benchmarks or success stories... – sorush-r Mar 22 '12 at 12:00
  • The Vortex86 SD is apparently a i586 without FPU. Try compiling for 586 instead, that may help quite a bit. Perhaps GCC has a specific setting for it, if so use that instead. You don't want it to optimize for U/V pipelines after all :-) – dascandy Mar 22 '12 at 12:21
  • No it's really i486! The datasheet says i586 but after weeks of trying, we considered that there is no support for i586. – sorush-r Mar 22 '12 at 12:33
  • `-march=native` is what I'd prefer. – Sebastian Mach Mar 22 '12 at 13:02
  • You can trial ICC without buying: http://www.softpedia.com/get/Programming/Coding-languages-Compilers/Intel-C-Compiler-for-Windows.shtml – Jon Cage Mar 22 '12 at 13:34
  • 1
    Check out some of these optimization links: [Optimization discussion][1] [Another optimization discussion][2] [Optimizing with processor cache][3] [1]: http://stackoverflow.com/questions/2932515/effective-optimization-strategies-on-modern-c-compilers/2932815#2932815 [2]: http://stackoverflow.com/questions/2074099/coding-practices-which-enable-the-compiler-optimizer-to-make-a-faster-program/2075264#2075264 [3]: http://stackoverflow.com/questions/3029738/algorithms-for-modern-hardware/3029763#3029763 – Thomas Matthews Mar 22 '12 at 13:50

2 Answers2

2

An option which typically makes the code faster is -funroll-loops

linello
  • 8,451
  • 18
  • 63
  • 109
1

See the documentation. There are too many permutations to test them all; maybe give Acovea a try, which tests for the best one with a genetic approach.

If you have many floating points optimizations, you may try -ffast-math or -Ofast, which includes -ffast-math. However, you lose IEEE floating math compliance.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • 1
    there is no hardware FPU in this CPU; only soft-fpu from linux kernel. – osgx Mar 22 '12 at 14:07
  • @osgx: But the soft-fpu can be used for IEEE-correct-FPU calculations, too, not? fast-math is about applying optimizations that break IEEE-compatibility, e.g. operand reordering or strength reduction (e.g. replacing `x * 3` with `x + x + x`). – Sebastian Mach Mar 22 '12 at 21:08
  • Yes, -ffast math will emit more optimal code; but soft-fpu is too slow in any case. – osgx Mar 23 '12 at 12:37