Does the VS2008 C++ optimizer sometimes produce slower code?

Question

Following on from a previous question, I've been playing around with optimizer settings in my release build to see what benefits are to be gleaned from using compiler optimization. Up until now, I've been using /Ob1 (only inline where inline is explicitly given), and /Oi (Enable intrinsic functions). I tried changing this to include /Ot (favour fast code), /Oy (omit frame pointers) and /Ob2 (inline any suitable), and to my surprise a regression suite that was taking 2h58 minutes now took 3h16m. My first assumption was that my own inlining was more aggressive than the compiler, but moving back from /Ob2 to /Ob1 only improved things to 3h12m. I'm still running more tests, but it would appear that in some cases /Ot (favour fast code) is actually slowing things down. The software is multi-threaded and computation intensive (surface modelling, manipulation and visualisation), and has already been heavily manually optimized based on profiler results. The program is also dealing with large amounts of data, and uses #pragma pack(4) pretty regularly.

So the questions is this. For a manually optimized program is compiler optimization in VS2008 liable to do more damage than good? Put another way, are there known documented scenarios where compiler optimization reduces performance? (n.b. profiling the compiler optimized code is painful, hence profiling to date has been done on unoptimized code).

Edit As per Cody Gray's and others suggestions, I have added /O2 to the optimization settings and re-executed my test suite. This resulted in a run time of 3h01, which was comparable to the minimally optimized run. Given the (slightly dated) MSDN guide lines on optimization and post from GOZ, I'm going to check /O1 to see if smaller is actually faster in my case. Note the current EXE file is about ~11mb. I'll also try and get a VS2010 build together and see how that fares.

Edit2 With /O1, the run time was 3h00, and the 11mb exe was 62k smaller. Note that the reason behind this post, and the previous linked one, were to check whether the benefits of turning on compiler optimizations outweighed the drawbacks in terms of profiling and debugging. In this specific instance, they appear not to be, although I admit to being surprised that none of the combinations tried added any benefit and some visibly reduced performance. FWIW, as per this previous thread, I tend to do most of my optimization at design time and use the profiler primarily to check design assumptions, I reckon I'll be sticking with this approach. I'll have one final go on VS2010 with whole program optimization enabled and leave it at that.

Thanks for all the feedback!

Haven’t used VS in a while so perhaps this should be obvious: what optimization level are you compiling with, apart from the epxlicit optimizations you mentioned? /O0? That’s certainly a no-go … — Konrad Rudolph, Mar 03 '11 at 11:44
While it may be painful to profile optimized code, I am not sure that you want to disable optimizations so that you can profile and optimize manually... It might be late for that though. In general optimizers are designed to work best on common idioms, and you will get the greatest improvements from the optimizer there. Then you can focus your efforts in profiling/optimizing the hot spots where compiler optimizations can be manually helped — David Rodríguez - dribeas, Mar 03 '11 at 11:45
@Konrad, in the IDE the optimization set to custom, Looking at the command line options, I don't have an optimization level set, e.g. /Ob1 /Oi /Ot are the only optimizatopn switches listed, certainly no /O0 or /Od. — SmacL, Mar 03 '11 at 11:58
The default settings for Release builds already crank optimizations up to the maximum. There's very little you can do that's going to squeeze better performance out of your app than flipping `/O2`. I'm interesting in seeing benchmarks on *that*, rather than your custom optimization settings. — Cody Gray - on strike, Mar 03 '11 at 12:03
@Cody, trying it now. Be right back to you in somewhere around 3 hours; ) — SmacL, Mar 03 '11 at 12:09

score 4 · Accepted Answer · answered Mar 03 '11 at 11:55

4

The documentation for /Ot states:

If you use /Os or /Ot, then you must also specify /Og to optimize the code.

So you might want to always pass /Og with /Ot in your tests.

That said, /Ot favors fast code at the expense of program size, and can produce very large binaries, especially with heavy inlining. Large binaries have difficulties to take advantage of the processor cache.

answered Mar 03 '11 at 11:55

Frédéric Hamidi

258,201
41
486
479

2

This is correct. But instead of adding `/Og`, I'd suggest simply specifying `/O2` and leaving it at that. I have a *really* hard time believing that you're really going to outsmart VS's compiler. – Cody Gray - on strike Mar 03 '11 at 12:01
Thanks Frédéric, I'll try this. I'm surprised that the IDE doesn't do this by default, through the online help says that from VS2005 onwards /Og is a depracted option. – SmacL Mar 03 '11 at 12:01
1

@Shane: It doesn't do it by default because the default is even better: `/O2`. They attempted to simplify the compiler options *drastically* in recent versions because it was determined that no one understood them or used them correctly. `/Og` is deprecated because there's no real benefit to be found in setting these flags piecemeal. – Cody Gray - on strike Mar 03 '11 at 12:04
2

@Shane, `/Og` is deprecated in favored of `/O1` (which doesn't imply `/Ot`) and `/O2` (which does). So, I'd second @Cody and suggest you use one of them :) – Frédéric Hamidi Mar 03 '11 at 12:05

score 2 · Answer 2 · answered Mar 03 '11 at 11:54

2

Its quite possible that it is trying to inline large functions or, at least, a large amount of functions in a loop. At this point you run the risk of causing an instruction cache reload. That can cause a big slow down. Inline is not always the best thing to do (though more often than not it is helpful). If you have any large loops with lots of function calls in it then it may be better to break the loop into several loops. This way the loop can stay inside the instruction cache and you get significantly better performance.

answered Mar 03 '11 at 11:54

Goz

61,365
24
124
204

While your point is well-taken from a general perspective, it's *very* unlikely that the compiler is inlining such large functions. I've never seen it do anything that stupid. – Cody Gray - on strike Mar 03 '11 at 12:09
1

Did you see the "heavily manually optimized based on profiler results" quote? There might be a bunch of forced inlines that are causing havoc in combination with the built-in heuristics. – MSalters Mar 03 '11 at 12:12
@Cody: Then you are a very lucky person. The compiler often does all sorts of retarded things. None-the-less if turning on "try to inline everything" produces slower code I fail to see how this CAN'T be the issue!! Its also worth noting that unless you have a heavy loop that is called a lot then you may not even notice a problem like that occurring. – Goz Mar 03 '11 at 12:14
@MSalters, that was my initial conclusion as well, but the differences between forced inlines and letting the compiler control the inlining was not that significant. – SmacL Mar 03 '11 at 15:19
@Cody, the following MSDN article tends to support GOZs point that /O2 may not be suitable for large apps. http://msdn.microsoft.com/en-us/library/aa290055(v=vs.71).aspx – SmacL Mar 03 '11 at 15:19
@Goz: If you tell your compiler to generate sub-optimal code by inlining everything, you can't blame the compiler for generating sub-optimal code. This is why surgeons ask *me* what decision to take, right? So they are no longer responsible :) – tenfour Mar 03 '11 at 16:58
@Shane: That may well be the case. However, I suspect that the optimizations have improved *substantially* since VS 2002 (a lot of the switches discussed in that article have been deprecated or removed entirely in later versions), and I would *start* with `/O2`, only moving to `/O1` instead if there is an obvious performance difference. Really, what that article says to me is that you should always profile your code looking for hotspots. You've started trying to do that, but you're doing it without optimizations turned on, which seems to be missing the point. – Cody Gray - on strike Mar 03 '11 at 23:39
@Cody, the types of optimizations I make as a result of profiling tend to be algorithmic in nature, rather than the micro-optimizations the compiler is carrying out. I'm typically looking for savings in terms of orders of magnitude in speed, or multiples in space. Where possible, I try to do the bulk of my optimization prior to writing code and use the profiler as a tool to verify my design assumptions. I was interested to see if the compiler optimizer would make any substantial differences to performance, and it appears not to in my case. YMMV. – SmacL Mar 04 '11 at 07:41

score 2 · Answer 3 · answered Mar 03 '11 at 13:20

2

It's well known that favour fast code is not always faster than favour small code. The compiler heuristic is not omniscient and it can make mistakes. In some cases, smaller code is faster than faster code, as it were.

Use /O2 for the fastest code- the compiler knows better than you how the various settings may interact.

Wait. You profiled unoptimized code? That's insanity. Compiler optimizations are not like manual optimizations - they're always done and there's no reason to profile for them- you could identify bottlenecks that don't exist, and etc. If you want accurate profiling data, you get the compiler to do it's absolute best first, and then you profile.

You could also look at using Profile Guided Optimization, which will guide the compiler's optimizer in some impressive fashions.

answered Mar 03 '11 at 13:20

Puppy

144,682
38
256
465

When you say "it's well known", could you quote your references, as I couldn't find any after a search. Your statements regarding the compiler heuristic not being omniscient seem to directly contradict your following statements. As for profiling unoptimized code, I was profiling the release version which wasn't using the optimizer for two reasons. Firstly, it makes profiling much more difficult. Secondly, the performance gains from compiler optimization tried out previously weren't giving significant enough gains to warrant thier inclusion. I profile primarily to check design assumptions. – SmacL Mar 03 '11 at 14:29
1

"Well known" means that several people have seen that if you have already tuned your code yourself, asking the compiler to tune it for speed might not help. Asking it to produce small code sometimes helps improve the cache footprint, and gain a few percent extra speed. Haven't seen anybody write an article about it though. – Bo Persson Mar 03 '11 at 16:21
@Bo, sounds like the making of a good article for someone out there. I had a bit of a search and much of the material seems out of date. Your definition of 'well known', i.e. compiler optimizations might not always help performance tend to agree with what I'm finding in this specific case, but it seems that it's 'well known' to others that the compiler always knows best, hence the request for references. I suspect there's more speculation than up to date experience on the subject, and would love to be proven wrong ;) – SmacL Mar 03 '11 at 17:05

Does the VS2008 C++ optimizer sometimes produce slower code?

3 Answers3

Linked