1

OK, here's my issue :

  • I'm working on a super-complex project and speed & performance is crucial - with lots of bit twiddling and low-level stuff (you may ask me if there's anything specific you need to know)
  • I'm using a Mac
  • I'm compiling and linking it using clang++

Apple clang version 4.0 (tags/Apple/clang-421.0.60) (based on LLVM 3.1svn)

All the optimization flags I'm currently using is -O3 (which, honestly, gave me an unexpected boost; so I didn't look any further; however I've still noticed other programs using a variety of flags... so I feel a bit lost...).

Is there anything else I should consider? And if so, what?


EDIT : (tried using -O4 and I'm getting errors; any ideas why this might be happening?)

0  0x10be24280  __assert_rtn + 144
1  0x10be89659  ld::tool::HeaderAndLoadCommandsAtom<x86_64>::copyEntryPointLoadCommand(unsigned char*) const + 169
2  0x10be8853c  ld::tool::HeaderAndLoadCommandsAtom<x86_64>::copyRawContent(unsigned char*) const + 1084
3  0x10be7da56  ld::tool::OutputFile::writeAtoms(ld::Internal&, unsigned char*) + 598
4  0x10be79c14  ld::tool::OutputFile::writeOutputFile(ld::Internal&) + 564
5  0x10be74963  ld::tool::OutputFile::write(ld::Internal&) + 147
6  0x10be248ef  main + 1263
7  0x10be13234  start + 52
A linker snapshot was created at:
    /tmp/myapp-2013-00-31-150316.ld-snapshot
ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file /SourceCache/ld64/ld64-133.3/src/ld/ld.hpp, line 657.
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Dr.Kameleon
  • 22,532
  • 20
  • 115
  • 223
  • possible duplicate of [Performance optimization strategies of last resort](http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort) – Benjamin Bannier Jan 31 '13 at 12:48
  • What are your current compiler flags? What does your "super complex" code do? Floating point arithmetic? String manipulation? Bit twiddling suggests integer arithmetic? – CadentOrange Jan 31 '13 at 12:50
  • @honk Not actually a duplicate. What I'm interested in compiler optimization flags, etc specifically for Clang/LLVM. – Dr.Kameleon Jan 31 '13 at 12:51
  • I voted to close this because dramatic improvement of random code rarely come from obscure compiler flags (different story for very specific problem sets, but you didn't go into details), but another thing you should consider is enabling link-time optimizations. Clang does this via LLVM-bytecode representations and might be able to optimize your code notably. – Benjamin Bannier Jan 31 '13 at 12:53
  • @CadentOrange My current compiler flags are just `-O3` (as I mentioned above). Now, as for the actual "complexity", it mostly revolves around bitboards (64-bit integers)... so intense integer arithmetic operations + vectors + linked lists is what would make the core of it, from a programming view point. – Dr.Kameleon Jan 31 '13 at 12:53
  • In a recent experiment with some bit-twiddling, we found gcc/g++ gave a notably better result for the same code. For most things, g++ should be interchangeable with clang. Of course, no guarantees, but I'm sure it's worth at least a couple of hours. – Mats Petersson Jan 31 '13 at 12:54
  • @honk Trust me even after I used my only flag (`-O3`) speed boost **was** dramatic (like 60% faster, with the exact same codebase). As for link-time optimizations, what do you mean? Can you give me an example or point me to some reference? – Dr.Kameleon Jan 31 '13 at 12:55
  • @Dr.Kameleon: Good that `-O3` worked for you, but I suspect it was able to optimize certain patterns in *your specific code*. As for enabling link-time optimizations see e.g. http://wiki.gentoo.org/wiki/Clang#Enabling_link-time_optimizations or your favorite search engine. – Benjamin Bannier Jan 31 '13 at 12:57
  • @MatsPetersson Funny thing is I've run several similar tests myself and I founf clang++/llvm to be a bit faster than g++ (not such a big difference, perhaps around 5%; but not circumstancial at all...). So I decided to stick with Clang. Perhaps a Mac-specific thing? Don't know... – Dr.Kameleon Jan 31 '13 at 13:00
  • @honk Thanks a lot for your help. The weird thing is that (the first thing I noticed actually) I've tried using a `-O4` flag, but it throws several errors. Any idea why this might be happening? (please, have a look at my original post re-edit) – Dr.Kameleon Jan 31 '13 at 13:03
  • No, it's probably depending on exactly what your code does. Unfortunately, it's often the case that one compiler will give good performance for one function, and another not so good. I would still try both - you never know until you try. Also try "whole program optimization" if you have a gcc 4.7 or later. – Mats Petersson Jan 31 '13 at 13:05
  • 1
    @Dr.Kameleon -- clang is usually faster than gcc because, you have ancient gcc on mac. – Leonid Volnitsky Jan 31 '13 at 13:27
  • The error in the update looks like a LLVM/linker crash. Any updates available? – vonbrand Jan 31 '13 at 20:11
  • @LeonidVolnitsky, clang might be faster when compiling, but g++ could produce faster executables. No way to know without checking... – vonbrand Jan 31 '13 at 20:12
  • At least for g++, any optimization above -O2 creates _slower_ code on current machines (-O3 and up include code changes that increase code size, and as current machines are memory-bound (not CPU-bound), this makes stuff slower. The kernel (use to) use -Os (optimize for size) as that was faster). Dunno about clang. – vonbrand Jan 31 '13 at 20:15
  • @vonbrand Thanks for the suggestions; I'll play around a bit and will let you know! ;-) – Dr.Kameleon Jan 31 '13 at 20:33

1 Answers1

0

After doing obvious things (compile options: O3, lto and not using any debug flags), here are my steps to speeding up a program:

1) Profile. After profiling you will know what section/module needs to be looked at.

2) Instrument/benchmark. Put some timing code around critical section.

3) Actually try to change your code and see if it slower or faster. Biggest culprits: bad algorithm or data structure; excessive use of malloc/new.

Leonid Volnitsky
  • 8,854
  • 5
  • 38
  • 53