6

I would like to understand good code optimization methods and methodology.

  1. How do I keep from doing premature optimization if I am thinking about performance already.
  2. How do I find the bottlenecks in my code?
  3. How do I make sure that over time my program does not become any slower?
  4. What are some common performance errors to avoid (e.g.; I know it is bad in some languages to return while inside the catch portion of a try{} catch{} block
esac
  • 24,099
  • 38
  • 122
  • 179
  • Dupe of http://stackoverflow.com/questions/895574/what-are-some-good-code-optimization-methods among many others –  Aug 30 '09 at 18:15
  • 3
    The link I posted isn't about methodologies. and it seems that you don't understand the meaning of the term "API". –  Aug 30 '09 at 18:26
  • 2
    Dividing an int by 2 is about 4 times faster than multiplying by 0.5 in C/C++ – Gerald Aug 30 '09 at 18:28
  • 1
    First, this should be a wiki. Second I don't think it is a particularly good question for SO since there will be quite a few differing opinions about even the same proposed optimisations. – EBGreen Aug 30 '09 at 18:29
  • @Neil Butterworth: You can have your opinion, no need to insult my intelligence though. – Brock Woolf Aug 30 '09 at 18:30
  • divide by 2 is the same as shifting bits to the left by one. Left shift is after. ( At least in year 2000... >__> ) – Aaron Qian Aug 30 '09 at 18:32
  • 3
    @Brock So telling me to "read the question" doesn't insult my intelligence? –  Aug 30 '09 at 18:33
  • @Neil Butterworth: Apologies, it wasn't my intention. I just meant the content of the question doesn't appear to say the same thing as the question's title. – Brock Woolf Aug 30 '09 at 18:35
  • 6
    Exactly what has *any* of this question to do with an API? – oxbow_lakes Aug 30 '09 at 18:41
  • Well I suppose if you were writing an API you would want to optimize it. – EBGreen Aug 30 '09 at 18:59
  • +1 for being educational, not for the examples. – H H Aug 30 '09 at 22:10
  • @Brock: to Neil's link I would add http://stackoverflow.com/questions/653980/c-optimization-techniques http://stackoverflow.com/questions/763656/should-we-still-be-optimizing-in-the-small http://stackoverflow.com/questions/486021/optimization-techniques-in-c and http://stackoverflow.com/questions/593343/what-are-you-favorite-low-level-code-optimization-tricks all taken from http://stackoverflow.com/search?q=micro+optimization – dmckee --- ex-moderator kitten Aug 31 '09 at 03:42
  • Please do **not** delete this question. Just because it’s a duplicate doesn’t mean it’s worthless! – Konrad Rudolph May 21 '10 at 09:20

21 Answers21

40

For the kinds of optimizations you are suggesting, you should write your code for clarity and and not optimize them until you have proof that they are a bottleneck.

One danger of attempting micro-optimizations like this is that you will likely make things slower, because the compiler is smarter than you are a lot of the time.

Take your "optimization":

const int windowPosX = (screenWidth * 0.5) - (windowWidth * 0.5);

There is no serious compiler in the world that doesn't know that the fastest way to divide by two is to shift right by one. Multiplying by floating-point 0.5 is actually more expensive, because it requires converting to floating-point and back, and doing two multiplies (which are more expensive than shifts).

But don't take my word for it. Look at what the compiler actually does. gcc 4.3.3 on 32-bit Ubuntu (-O3, -msse3, -fomit-frame-pointer) compiles this:

int posx(unsigned int screen_width, unsigned int window_width) {
  return (screen_width / 2) - (window_width / 2);
}

to this:

00000040 <posx>:
  40:   8b 44 24 04             mov    eax,DWORD PTR [esp+0x4]
  44:   8b 54 24 08             mov    edx,DWORD PTR [esp+0x8]
  48:   d1 e8                   shr    eax,1
  4a:   d1 ea                   shr    edx,1
  4c:   29 d0                   sub    eax,edx
  4e:   c3    

Two shifts (using an immediate operand) and a subtract. Very cheap. On the other hand, it compiles this:

int posx(unsigned int screen_width, unsigned int window_width) {
  return (screen_width * 0.5) - (window_width * 0.5);
}

to this:

00000000 <posx>:
   0:   83 ec 04                sub    esp,0x4
   3:   31 d2                   xor    edx,edx
   5:   8b 44 24 08             mov    eax,DWORD PTR [esp+0x8]
   9:   52                      push   edx
   a:   31 d2                   xor    edx,edx
   c:   50                      push   eax
   d:   df 2c 24                fild   QWORD PTR [esp]
  10:   83 c4 08                add    esp,0x8
  13:   d8 0d 00 00 00 00       fmul   DWORD PTR ds:0x0
            15: R_386_32    .rodata.cst4
  19:   8b 44 24 0c             mov    eax,DWORD PTR [esp+0xc]
  1d:   52                      push   edx
  1e:   50                      push   eax
  1f:   df 2c 24                fild   QWORD PTR [esp]
  22:   d8 0d 04 00 00 00       fmul   DWORD PTR ds:0x4
            24: R_386_32    .rodata.cst4
  28:   de c1                   faddp  st(1),st
  2a:   db 4c 24 08             fisttp DWORD PTR [esp+0x8]
  2e:   8b 44 24 08             mov    eax,DWORD PTR [esp+0x8]
  32:   83 c4 0c                add    esp,0xc
  35:   c3                      ret

What you're seeing is conversion to floating-point, multiplication by a value from the data segment (which may or may not be in cache), and conversion back to integer.

Please think of this example when you're tempted to perform micro-optimizations like this. Not only is it premature, but it might not help at all (in this case it significantly hurt!)

Seriously: don't do it. I think a golden rule is never to do optimizations like this unless you routinely inspect your compiler's output as I have done here.

Josh Haberman
  • 4,170
  • 1
  • 22
  • 43
  • 3
    Thank you so much for this. I don't know why it never occurred to me to disassemble myself. Truly eye opening. It also highlights that you shouldn't believe everything you hear from other programmers (which is where I 'learned' the '/0.5' instead of '*2'). Thanks champ – Brock Woolf Sep 06 '09 at 19:55
  • the example is interesting cause it shows the disassembly but it kills your point that the compiler is "smart". It also does not fit you point of premature optimization. I don't think many people would consider *0.5 an optimization. Premature optimization comes from using C to avoid stl containers like vector and string, omitting virtual or making any other compromise on code clarity and correctness when not necessary. Mainly it is about optimizing code while in design, and probably by the time the software is released the design has changed and the optimisation was time/work down the drain –  Oct 31 '10 at 11:16
  • I see simpler output for `x*0.5` case on x86-64 https://gist.github.com/856669#file_posx.dis – jfs Mar 05 '11 at 20:03
  • 1
    @ufotds *…it kills your point that the compiler is "smart"…* - for multiple reasons, the results produced by the two bodies may be different. the compiler is doing the right thing. an engineer and a compiler should understand the difference. – justin Feb 13 '12 at 07:26
  • @ufotds, on the contrary, this example is great, because a lot of noob programmers think `*0.5` is faster than `/2` because it is a multiplication instead of a division. Also, the compiler cannot optimize `*0.5`, because although mathematically equivalent to `/2`, that floating point operation may in reality have a different result on some machines. See C11 standard, section F.9.2.1 – Shahbaz Jun 16 '12 at 22:22
28
  1. Don't think about performance, think about clarity and correctness.
  2. Use a profiler.
  3. Keep using a profiler.
  4. See 1.
17

EDIT This answer originally appeared in another question (that has been merged) where the OP listed some possible optimizations techniques that he just assumed must work. All of them relied heavily on assumptions (such as x << 1 being always faster than x * 2). Below I am trying to point out the danger of such assumptions.


Since all of your points are probably wrong this shows the danger of such premature and trivial optimizations. Leave such decisions to the compiler, unless you know very well what you are doing and that it matters.

Otherwise it – just – doesn’t – matter.

Much more important (and not at all premature) are optimizations in the general program structure. For example, it is probably very bad to re-generate the same large batch of data over and over because it’s needed in many places. Instead, some thought has to be put into the design to allow sharing this data and thus only calculating it once.

It’s also very important to know the domain you’re working in. I come from a bioinformatics background and do a lot of hardcore algorithm work in C++. I often deal with huge amount of data. At the moment though, I’m creating an application in Java and I cringe every time I create a copy of a container because I’m conditioned to avoid such operations as hell. But for my fancy Java GUI those operations are completely trivial and not one bit noticeable to the user. Oh well, I’ve just got to get over myself.

By the way:

Initialise constants when you declare them (if you can) …

Well, in many languages (e.g. C++) constants (i.e. identifiers marked as const) have to be initalized upon definition so you don’t actually have a choice in the matter. However, it is a good idea to follow that rule, not only for constants but in general. The reason isn’t necessarily performance, though. It’s just much cleaner code because it clearly binds each identifier to a purpose instead of letting it fly around.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
15

The rules of Optimization Club:

  1. The first rule of Optimization Club is, you do not Optimize.
  2. The second rule of Optimization Club is, you do not Optimize without measuring.
  3. If your app is running faster than the underlying transport protocol, the optimization is over.
  4. One factor at a time.
  5. No marketroids, no marketroid schedules.
  6. Testing will go on as long as it has to.
  7. If this is your first night at Optimization Club, you have to write a test case.

http://xoa.petdance.com/Rules_of_Optimization_Club

Rule #3 is the one that trips most people up. It doesn't matter how fast your calculations are if your program sits waiting for disk writes or network transfer.

Rules #6 and #7: Always have tests. If you're optimizing, you're refactoring, and you don't want to be refactoring without having a solid test suite.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
8

It's always good to remember what things "cost". Some C# examples:

  • String concatenation always creates a new string as strings are immutable. Therefore, for repeated concatenations, a StringBuilder is more efficient.

  • Repeated or large memory allocations are generally something you should watch for.

  • Exceptions are very expensive to throw. This is one of the reasons why exceptions should only be used for exceptional situations.

Most things beyond this are premature optimization. Use a profiler if speed matters.


Regarding your "optimizations":

  • I doubt that floating point arithmetic (* 0.5) is faster than an integer division (/ 2).

  • If you need an array of size 300 you should initialize an array of size 300. There's nothing "magic" about powers of 2 that make arrays of size 256 more efficient.

  • "requires 2 calls in the code" is wrong.

dtb
  • 213,145
  • 36
  • 401
  • 431
  • 3
    On Core2, floating point multiplication (fmul, 5 cycles latency) actually is significantly cheaper than integer division (idiv, 18-42 cyles latency, depending on how big the numbers are). See Agner Fog's instruction tables for more details. But that's not relevant here, because this example won't actually compile to idiv. – Josh Haberman Aug 30 '09 at 19:55
  • 3
    @Josh: `fmul` may be faster than `idiv` – but what about the conversions to and from `double`? – Konrad Rudolph Aug 31 '09 at 10:13
  • Actually, exceptions are expensive to *create* not as much so to *throw*. – Lawrence Dol May 20 '10 at 19:04
5

Make sure you have clearly defined performance goals and tests that measure against those goals so you can quickly find out if you even have a problem.

Think about performance more from a design perspective than from a coding perspective - optimizing a poor-performing design just results in faster slow code

When you do have a performance problem, use a tool such as a profiler to identify the problem - you can guess where your bottlenecks are and usually guess wrong.

Fix performance problems early in development rather than putting them off - as time goes on and features make it into the product fixing perf issues will only become more and more difficult.

Michael
  • 54,279
  • 5
  • 125
  • 144
  • "Fix performance problems early in development rather than putting them off - as time goes on and features make it into the product fixing perf issues will only become more and more difficult." Quoted-for-truth. Sometimes I feel like everyone else on Stack Overflow has forgotten or chosen to ignore this principle. – Crashworks May 21 '09 at 22:41
  • Yes, I recall one project where they were not even close to meeting their performance goals and had scheduled "performance" for a much later milestone. I don't know how they expected to meet their perf goals later with a load of features piled on top of their already unperforming code base. And their perf milestone was a huge failure. – Michael May 21 '09 at 22:51
5

Early optimzation isn't always premature - it's bad only if you hurt other interests (readability, maintenance, time to implement, code size, ...) without justification.

On stackoverflow, early optimization is the new goto, don't get discouraged by that. Any decision going wrong early is hard to fix later. Optimization is special only because experience shows it can often be fixed locally, whereas sucky code requires large scale changes.


Sorry for the rant, now for your actual question:

Know your environment!
This includes all the low level details - - e.g. nonlinearity memory access, things the compiler can optimize, etc. The trick is notto fret at's a lot you shouldn't fret to much, just be aware of.

Measure measure measure!
The results of actual optimization attempts are often surprising, especially if you vary seemingly unlrelated factors. It is also the best way to develop a relaxed attitude towards performance - most of the time it really doesn't matter.

Think about algorithms before you think about implementation details.
Most low level optimizations give you a factor of 1.1, a different algorithm can give you a factor of 10. A good (!) caching strategy can give you a factor of 100. Figuring out that you really don't need to make the call gives you Warp 10.

This usually leads me to thinking about how to organize the data: what are frequent operations that are potential bottlenecks, or scalability issues?

peterchen
  • 40,917
  • 20
  • 104
  • 186
4
  1. Write readable code showing intent. Do not microoptimize - you cannot outsmart the JIT.
  2. Learn to use a profiler, e.g. jvisualvm in the Sun JDK, and use it.
Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
4

We had a subcontractor write us a non-trivial amount of code for non-trivial program. Apparently they always used trivial amount of data. So..

  • Use profiler
  • Use non-trivial amount of data when testing. If possible make that humongous amount of data.
  • Use sane algorithms when things become tight.
  • Use profiler to check that whatever "optimization" you've done is actually correct, f.example recent "java jar" fiasco where O(1) operation was done as O(n).
Pasi Savolainen
  • 2,460
  • 1
  • 22
  • 35
4

The number one rule I use is, DRY (Don't Repeat Yourself). I find that this rule does a good job of highlighting problem areas that can be fixed without hurting the clarity of the program. It also makes it easier to fix bottlenecks once you discover them.

Adam Ruth
  • 3,575
  • 1
  • 21
  • 20
3

I would say that little optimisations you can make on the go are exactly those that do not make much sense. If you want to optimise, write the code first, then profile and only then optimise the parts that take too much time. And even then it’s usually algorithms that need optimization, not the actual code.

zoul
  • 102,279
  • 44
  • 260
  • 354
3

I'd even say for integral types, instead of multiplying by 0.5 you could just shift them one bit to the right (not forgetting signed/unsigned shifting).

I am quite sure that, at least in the case of C#, the compiler will optimize a lot.

for example:

Guid x = Guid.NewGuid(); 

as well as

Guid x; x = Guid.NewGuid();

both translate to the following CIL:

call        System.Guid.NewGuid
stloc.0   

Constant expressions such as (4 + 5) are precalculated as well as string concatenations like "Hello " + "World".

I would primarily concentrate on readable and maintainable code though. It's highly unlikely that such microoptimizations make a big difference except for special edge cases.

In my case, the most gain in (sometimes perceived) performance were things like the following:

  • If you fetch data from a database or the internet, use a separate thread
  • If you load large XML Files, do it on a separate thread.

of course, that's only my experience from C#, but in-/output operations are common bottlenecks.

Botz3000
  • 39,020
  • 8
  • 103
  • 127
2

I don't know if you will buy it, but early optimization is not the root of all evil

Lawrence Dol
  • 63,018
  • 25
  • 139
  • 189
Aaron Qian
  • 4,477
  • 2
  • 24
  • 27
  • +1: Note - I almost down-voted this based on the (originally) inverted assertion of the link text (which I have just corrected), until I actually read the article. – Lawrence Dol May 20 '10 at 19:22
2

There are certainly optimizations you can do as you go, such as passing large objects by const reference in C++ (C++ Optimization Strategies and Techniques). The first suggestion ("don't divide by 2") might fall under a "strategy that bombs" - assuming certain operations are faster than others.

One premature (?) optimization I'm guilty of is moving declarations of expensive objects out of loops. E.g if a loop would require a fresh vector for each iteration, I often do:

std::vector<T> vec;
while (x) {
    vec.clear(); //instead of declaring the vector here
    ...
}

I once made a vector used locally in a member function static as an optimization (to reduce memory allocations - this function was called very often), but that bombed when I decided I'd gain real performance from using more than one of these objects in multiple threads.

UncleBens
  • 40,819
  • 6
  • 57
  • 90
2

This takes experience, I'm afraid. When you conceive of solutions to a problem, you may think in terms of class hierarchies, or you may think in terms of what information goes in, what comes out, how long does it need to be persistent in between. I recommend the latter.

In any case, what people have said is mostly good advice - keep it clean and simple, and get rid of performance problems as they come in, because they will come in.

Where I part company is I don't find measurement very helpful for locating performance problems compared to this method.

But whatever method you use, hopefully experience will teach you what NOT to do in developing software. I've been solving performance problems for a long time, and these days, the single most popular performance killer is galloping generality. Nobody likes to hear their favorite beliefs questioned, but time after time, especially in big software, what is killing performance is using bazookas to swat flies.

Oddly enough, the reason often given for this over-design is guess what? Performance.

In whatever venue you may have learned to program, chances are you've learned all about academic things like sophisticated data structures, abstract class hierarchies, tricky compiler optimization techniques - all the latest stuff that's fun and interesting to know, and that I like as much as anybody. What they didn't teach you is when to use it, which is almost never.

So what I recommend you do is: Get the experience. It is the best teacher.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
1

You may want to look into what compilers DO optimize away - many compilers can optimize away things like tail recursion, and most other minor optimizations are trivial by comparison. My methodology is to write things so that they're as readable/manageable as possible, and then, if I need to, look so see if the generated assembly code needs optimization of some sort. This way no time needs to be spent optimizing things that don't need to be optimized.

T.R.
  • 7,442
  • 6
  • 30
  • 34
1

Profile, profile, profile. Use valgrind if you can (along with the kcachegrind visualizer), otherwise use the venerable gprof.

My top performance hits:

  1. Allocating memory without freeing it. Possible only using C and C++.
  2. Allocating memory.
  3. Calls to really small procedures, functions, or methods that your compiler somehow fails to inline.
  4. Memory traffic.

Everything else is in the noise.

Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
0

A summary of code optimization methods (language-independent) on github (ps i'm the author)

Outline:

  1. General Principles (i.e caching, contiguous blocks, lazy loading, etc..)
  2. Low-level (binary formats, arithmetic operations, data allocations, etc..)
  3. Language-independent optimization (re-arranging expressions, code simplifications, loop optimisations, etc..)
  4. Databases (lazy loading, efficient queries, redundancy, etc..)
  5. Web (minimal transactions, compactification, etc..)
  6. References
Nikos M.
  • 8,033
  • 4
  • 36
  • 43
0

1, 2, and 3 have the same answer: Profile. Get a good profiler and run it on your app, both in intrusive and in sampling modes. This will show you where your bottlenecks are, how severe they are, and doing it on a regular basis will show you where perf has gotten worse from week to week.

You can also simply put stopwatches into your app so that it tells you, say, exactly how many seconds it takes to load a file; you'll notice if that number gets bigger, especially if you log it.

4 is a big, big question that ranges all the way from high-level algorithm design down to tiny details of a particular CPU's pipeline. There's lots of resources out there, but always start at the high level -- take your outer loop from O(N^2) to O(N log N) before you start to worry about integer opcode latency and the like.

Crashworks
  • 40,496
  • 12
  • 101
  • 170
0
  • Only optimize when you have performance issues.
  • Only optimize the slow parts, as measured!
  • Finding a better algorithm can save you orders of magnitude, rather than a few percent.

It's mentioned above, but it's worth talking about more: measure! You must measure to make sure you're optimizing the right thing. You must measure to know if you've improved, or improved enough, and by how much. Record your measurements!

Also, often you will identify a routine as taking, say, >75% of the total time. It's worth taking the time to profile at a finer grain... often you will find most of the time within that routine is spent in a very small part of the code.

dwc
  • 24,196
  • 7
  • 44
  • 55
0

How do I keep from doing premature optimization if I am thinking about performance already.

How important is the optimization, and how will it affect readbility and maintainability?

How do I find the bottlenecks in my code?

Walk through its execution flow in your mind. Understand the costs of the steps taken. As you go, evaluate the existing implementation in this context. You can also walk through with a debugger for another perspective. Consider and actually try alternative solutions.

To contradict to popular approaches, profiling after the program's written or once there is a problem is naive -- it's like adding sauce to a poorly prepared meal to mask unpleasantries. It can also be likened to the difference between the person who always asks for solutions rather than actually determining the reason (and learning why in the process). If you have implemented a program, then spent time profiling, then made the easy fixes, and made it 20% faster in the process… that's not usually a "good" implementation if performance and resource utilization are important because all the small issues that have accumulated will be high noise in the profiler's output. It's not unusual for a good implementation to be 5, 10, or even 25 times better than the casual constructor's implementation.

How do I make sure that over time my program does not become any slower?

That depends on many things. One approach would involve continuous integration and actually executing the program. However, that may be a moving target even in strictly controlled environments. Minimize changes by focusing on creating a good implementation the first time… :)

What are some common performance errors to avoid (e.g.; I know it is bad in some languages to return while inside the catch portion of a try{} catch{} block

I'll add: Multithreading is often used for the wrong reasons. It's used too commonly to exacerbate poor implementations rather than locating existing problems/weaknesses in the existing design.

justin
  • 104,054
  • 14
  • 179
  • 226