9

I would like to know how to avoid wasting my time and risking typos by re-hashing source code when I'm integrating legacy code, library code or sample code into my own codebase.

If I give a simple example, based on an image processing scenario, you might see what I mean.

It's actually not unusual to find I'm integrating a code snippet like this:

for (unsigned int y = 0; y < uHeight; y++)
{
    for (unsigned int x = 0; x < uWidth; x++)
    {
        // do something with this pixel ....
        uPixel = pPixels[y * uStride + x];
    }
}

Over time, I've become accustomed to doing things like moving unnecessary calculations out of the inner loop and maybe changing the postfix increments to prefix ...

for (unsigned int y = 0; y < uHeight; ++y)
{
    unsigned int uRowOffset = y * uStride;
    for (unsigned int x = 0; x < uWidth; ++x)
    {
        // do something with this pixel ....
        uPixel = pPixels[uRowOffset + x];
    }
}

Or, I might use pointer arithmetic, either by row ...

for (unsigned int y = 0; y < uHeight; ++y)
{
    unsigned char *pRow = pPixels + (y * uStride);
    for (unsigned int x = 0; x < uWidth; ++x)
    {
        // do something with this pixel ....
        uPixel = pRow[x];
    }
}

... or by row and column ... so I end up with something like this

unsigned char *pRow = pPixels;
for (unsigned int y = 0; y < uHeight; ++y)
{
    unsigned char *pPixel = pRow;
    for (unsigned int x = 0; x < uWidth; ++x)
    {
        // do something with this pixel ....
        uPixel = *pPixel++;
    }

    // next row
    pRow += uStride;
}

Now, when I write from scratch, I'll habitually apply my own "optimisations" but I'm aware that the compiler will also be doing things like:

  • Moving code from inside loops to outside loops
  • Changing postfix increments to prefix
  • Lots of other stuff that I have no idea about

Bearing in mind that every time I mess with a piece of working, tested code in this way, I not only cost myself some time but I also run the risk that I'll introduce bugs with finger trouble or whatever (the above examples are simplified). I'm aware of "premature optimisation" and also other ways of improving performance by designing better algorithms, etc. but for the situations above I'm creating building-blocks that will be used in larger pipelined type of apps, where I can't predict what the non-functional requirements might be so I just want the code as fast and tight as is reasonable within time limits (I mean the time I spend tweaking the code).

So, my question is: Where can I find out what compiler optimisations are commonly supported by "modern" compilers. I'm using a mixture of Visual Studio 2008 and 2012, but would be interested to know if there are differences with alternatives e.g. Intel's C/C++ Compiler. Can anyone shed some insight and/or point me at a useful web link, book or other reference?

EDIT
Just to clarify my question

  • The optimisations I showed above were simple examples, not a complete list. I know that it's pointless (from a performance point of view) to make those specific changes because the compiler will do it anyway.
  • I'm specifically looking for information about what optimisations are provided by the compilers I'm using.
Roger Rowland
  • 25,885
  • 11
  • 72
  • 113
  • I wouldn't do any of the optimizations you suggest (unless the code was really poor), but I would fix other 'style' issues. An important consideration which you don't mention is that by going through a process like this you really do get to know the code you are tweaking very well. – john Mar 23 '13 at 08:33
  • @john - yes, getting to know the code is a good point and so is style, which I didn't mention. My main problem is that I can't find any documentation of exactly what e.g. VS2012 does in the way of optimising. – Roger Rowland Mar 23 '13 at 08:41
  • Google for free PDF book - "Optimizing software in C++ An optimization guide for Windows, Linux and Mac platforms" By Agner Fog. Promise, this will be your favorite reading for a weekend. – SChepurin Mar 23 '13 at 09:41

3 Answers3

16

I would expect most of the optimizations that you include as examples to be a waste of time. A good optimizing compiler should be able to do all of this for you.

I can offer three suggestions by way of practical advice:

  1. Profile your code in the context of a real application processing real data. If you can't, come up with some synthetic tests that you think would closely mimic the final system.
  2. Only optimize code that you have demonstrated through profiling to be a bottleneck.
  3. If you are convinced that a piece of code needs optimization, don't just assume that factoring invariant expression out of a loop would improve performance. Always benchmark, optionally looking at the generated assembly to gain further insight.

The above advice applies to any optimizations. However, the last point is particularly relevant to low-level optimizations. They are a bit of a black art since there are a lot of relevant architectural details involved: memory hierarchy and bandwidth, instruction pipelining, branch prediction, the use of SIMD instructions etc.

I think it's better to rely on the compiler writer having a good knowledge of the target architecture than to try and outsmart them.

From time to time you will find through profiling that you need to optimize things by hand. However, these instances will be fairly rare, which will allow you to spend a good deal of energy on things that will actually make a difference.

In the meantime, focus on writing correct and maintainable code.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Thanks - I thought I was clear about the lack of specific non-functional requirements but maybe not clear enough. In my situation I could be putting together a lot of these "building blocks" before I can profile and find bottlenecks so I'm looking for "rule of thumb" lists of things - like the above - that I shouldn't bother doing. – Roger Rowland Mar 23 '13 at 08:44
  • 1
    Going to have to agree, my understanding too is that all of the optimizations listed at the time of this posting would be recognized by any decent, popular optimizing compiler. May help in debug builds, but otherwise, no difference in final output. – Sion Sheevok Mar 23 '13 at 08:44
  • @Sion - yes, I used those just as examples, I don't have an exhaustive list, which was really the point of my question. – Roger Rowland Mar 23 '13 at 08:45
  • 4
    @roger_rowland: As to your first comment, don't do *any* of them until you can profile. In the first instance, focus on correctness and clarity. – NPE Mar 23 '13 at 08:46
  • I'm not sure if there's much of an exhaustive list, but if in doubt, write it both ways and look at the assembly output - if they're identical in release, the compiler is already applying the optimization. You don't necessarily have to be able to comprehend the assembly output to see that it's identical. – Sion Sheevok Mar 23 '13 at 08:47
  • @NPE - yes I understand, I did mention in the question that I'm aware of premature optimisation. I was really hoping to find some links to information about the compilers I use - sorry for confusion. – Roger Rowland Mar 23 '13 at 08:47
  • 3
    @roger_rowland you may be aware of the existence of premature optimisation, but you seem to be unaware you are asking how to perform premature optimisation... – rubenvb Mar 23 '13 at 09:55
  • @rubenvb - I thought my language was clear but I have edited my post to specifically clarify what I am asking. I apologise if I confused you. – Roger Rowland Mar 23 '13 at 11:36
  • Even though I generally agree that compilers optimize many things, there is a specific call to be made regarding *aliasing*. The presence of possible aliases might hamper compiler optimizations, because for some optimizations to be correct the compiler has to *prove* two (or more) variables cannot possibly alias. – Matthieu M. Mar 23 '13 at 13:55
0

What can I assume about C/C++ compiler optimisations?

As possible as you can imagine, except the cases that you get issues either functionality or performance with the optimized code, then turn off optimization and debug.

Mordern compilers have various strategies to optimize your code, especially when you are doing concurrent programming, and using libraries like OMP, Boost or TBB.

If you DO care about what your code exactly made into machine code, it would be no more better to decompile it and observe the assemblies.

The most thing for you to do the manual optimization, might be reduce the unpredictable branches, which is some harder be done by the compiler.

If you want to look for the informations about optimization, there's already a question on SO

In the optimization options, there are explanations about what each optimize for:

And there's something about optimization strategies and techniques

Community
  • 1
  • 1
Ken Kin
  • 4,503
  • 3
  • 38
  • 76
  • I beg to differ. The compiler isn't magic. There are limits to what automatic program transformation can do, both practical and theoretical. Your examples are relatively simple transformations that indeed happen (and have happened for several decades). However, with exceptions for very particular languages (read: not C++ or anything like it) and trivial cases, no compiler will fix a [terrible algorithm](http://www.joelonsoftware.com/articles/fog0000000319.html) or misused data structure. –  Mar 23 '13 at 11:07
  • Ok many thanks - this is much more what I was looking for, I seem to have led people down the wrong path by using a specific example to illustrate my question. But you have indeed provided links to what I was wanting, I didn't find the previous SO question and I guess mine is a duplicate of that. – Roger Rowland Mar 23 '13 at 11:38
  • @roger_rowland: You're welcome. But seems NPE answered with more people agreed, maybe accept for that. – Ken Kin Mar 23 '13 at 11:49
  • @delnan: Actually, loop invariant code motion can turn some O(N*M) algorithms into O(N)+O(M) algorithms. Similarly, the terrible `strcat()` algorithm you linked to is optimizable too, simply by noticing that it never shortens a string. That still would re-scan the newly appended part, but O(2N) == O(N). (You'd want C99 for this, with `restrict`, though, C++ syntax can't express the restriction that src and dest do not overlap) – MSalters Mar 23 '13 at 12:16
  • @MSalters Hence "with exceptions for [...] trivial cases". Also, the optimization for `strcat` would require the compiler seeing the definition (incompatible with dynamic linking of the C stdlib, which is very common) or special knowledge about built-in functions. The latter does not help with most cases of bad algorithms, because usually the problem isn't just in how standard library functions are called. –  Mar 23 '13 at 12:28
0

I think it would probably be more useful for you to reconsider the premise of your question, rather than to get a direct answer.

Why do you want to perform these optimizations? Judging by your question, I assume it is to make a concrete program faster. If that is the case, you need to start with the question: How do I make this program faster?

That question has a very different answer. First, you need to consider Amdahl's law. That usually means that it only makes sense to optimize one or two important parts of the program. Everything else pretty much doesn't matter. You should use a profiler to locate these parts of the program. At this point you might argue that you already know that you should use a profiler. However, almost all the programmers I know don't profile their code, even if they know that they should. Knowing about vegetables isn't the same as eating them. ;-)

Once you have located the hot-spots, the solution will probably involve:

  1. Improving the algorithm, so the code does less work.
  2. Improving the memory access patterns, to improve cache performance.

Again, you should use the profiler to see if your changes have improved the run-time.

For more details, you can Google code optimization and similar terms.

If you want to get really serious, you should also take a look at Agner Fog's optimization manuals and Computer Architecture: A Quantitative Approach. Make sure to get the newest edition.

You might also might want to read The Sad Tragedy of Micro-Optimization Theater.

Jørgen Fogh
  • 7,516
  • 2
  • 36
  • 46
  • Well, I've tried my best to explain what the question is but that seems harder than I'd expected ;-) I don't want to know *what* to optimise, I want to know what to *not bother* to optimise as a normal practice. – Roger Rowland Mar 23 '13 at 14:00
  • 1
    Then the answer is pretty much everything. But I see how that is a valid question. – Jørgen Fogh Mar 23 '13 at 14:53
  • There is a lot of stuff available about the rich list of optimisations that GCC offers. You could read up on this by looking up the individual per-optimisation switches of GCC and then googling the topics. – Cecil Ward Aug 29 '16 at 00:54
  • I have learnt a lot about the impressive state of modern compilers’ optimisation capabilities by writing small C and D programs, compiling them, and then looking at the generated code with GCC’s `-O2` and `-O2` switches. I realise that you are using MSVC, but there will be an option to view generated code in that compiler too, and the quality of code generated with full-optimisation switches turned on for MSVC may be as rich as with GCC. See https://gcc.godbolt.org. (Also might take a look at http://d.godbolt.org if you want to try something new.) – Cecil Ward Aug 29 '16 at 00:54