2

I often have to write code that I would like to optimize for performance, and I often have several solutions to a particular problem.

Is there a simple way to determine the number of CPU cycles a particular statement/function would take? I'm not talking about complex code that access the file system, Windows APIs or the network, I'm talking about comparing half a dozen lines of C++ code to determine which code would be more efficient.

The classic example would be comparing ++i with i++. The former is faster, but without knowing that, how would I be able to determine this myself?

I'd rather not install costly performance tools (e.g. Intel's tools), but find a simple way to get to the bottom. Is there a way to see the Assembler code that is generated by C++ code - without debugging?

Any other suggestions and/or approaches are of course welcome.

Lucky Luke
  • 1,373
  • 2
  • 13
  • 18
  • No, with today's CPUs you can't look as the assembly code and tell how long it takes. Think caches, branch prediction, pipelines, etc. If you have to ask questions like this one, you likely don't stand a chance to get any significant insisghts this way. For most cases, the only reliable/practical way to know what runs fast is to, you know, actually run it and see how long it takes (i.e. profiling and benchmarking). And that's not even touching on whether you should even try. –  Aug 02 '11 at 19:41
  • @delnan I couldn't agree more. – Rafael Colucci Aug 02 '11 at 19:53
  • 1
    If I'm concerned about that (almost never) I wrap it in a (somewhat unrolled) loop and stopwatch it. If you execute it 10^9 times, then seconds translate to nanoseconds. – Mike Dunlavey Aug 02 '11 at 20:06

3 Answers3

4

Your "classic example" of ++i vs i++ is usually irrelevant. Optimizing compilers are good enough to prevent that from being an issue. In fact they're really good at making code that looks slow fast.

Look at algorithmic complexity: often if code is unexpectedly slow, there's a hidden O(n) in an inner loop somewhere.

It's been said before, profile, profile, profile. Counting cycles is much less relevant now, because of the importance of the cache. Microbenchmarks are sometimes ok for small chunks of code, but often aren't representative for their performance in an application.

Visual Studio has a built in profiler, described here: http://msdn.microsoft.com/en-us/magazine/cc337887.aspx which is really what you need.

Don't choose the code that's more efficient. Choose the code that's more readable.

Dave
  • 10,964
  • 3
  • 32
  • 54
  • 2
    There are some cases where ++i vs. i++ is not irrelevant. For primitive types an optimizer will catch it. For more complicated iterators, it might not be able to discard the instruction that stores off the original value before incrementing. Otherwise good answer. – Nathan Monteleone Aug 02 '11 at 19:43
  • Thanks, I will have a look at the profiler - wasn't actually aware of that for some reason, I thought it only worked with C#. I don't totally agree with not writing code that is efficient - if I write a network server that processes 10,000 packets per second then I'd like to make it efficient :-) – Lucky Luke Aug 02 '11 at 20:03
2

I know you said you do not want to pay for performance tools, but I highly suggests you to take a look at AQTime.

I know it can be expensive, but it is worth every penny invested. It is capable of doing a very good analyze of your code, such as allocation, performance and many, many others.

I can not imagine myself working without this tool, really. And I do not work for Smartbear. I am just a big fan.

What I think is: why should anyone bother reading and debugging Assembly when we have great tools to do that? Your time can be more productive if you have the right tools and focus on business.

Just my 2 cents.

Rafael Colucci
  • 6,018
  • 4
  • 52
  • 121
  • Yes, you are on to something. I've always had mediocre experiences with development add-ons, but their price is not completely unreasonable. I will definitely take a look. – Lucky Luke Aug 02 '11 at 20:03
  • It is not an add-on. It is an application. You can download a 30 days trial and see for yourself. – Rafael Colucci Aug 02 '11 at 20:27
  • I used the wrong wording. As "add-on" I meant a generic "add-on" to my develoment environment. I realize it's a stand-alone app. I will check it out, thank you. – Lucky Luke Aug 02 '11 at 21:15
  • I decided to give this product a try, but I'm far from impressed. The account manager can't reply to my emails, and the product is pretty cumbersome to use in my opinion. – Lucky Luke Dec 09 '11 at 19:36
1

Using the Visual Studio prompt you can invoke cl.exe (the VC++) compiler and produce assembly listings with the option /FA[c|s|u].

cl.exe /FA mycode.c

Generates a file named mycode.asm, containing the listings, looking something like:

; Line 16
    push    ebp
    mov ebp, esp
; Line 17
    cmp DWORD PTR _argc$[ebp], 2
    jl  SHORT $LN2@main
    cmp DWORD PTR _argc$[ebp], 2
    jle SHORT $LN3@main
$LN2@main:
; Line 19
    push    OFFSET $SG2660
    call    _puts
    add esp, 4

... and so forth.

Similarly if you put a breakpoint inside VS and open the disassembly, you will see the assembly listings (provided the circumstances are right, debug mode should probably be on.)

This is probably of interest as well: How many CPU cycles are needed for each assembly instruction?

Community
  • 1
  • 1
Skurmedel
  • 21,515
  • 5
  • 53
  • 66
  • Thanks, I accepted your answer since it essentially, well, answers my question. I realize though that this is a potentially tedious and inaccurate way of going about it, thanks to the other comments. Thanks everybody for their input. – Lucky Luke Aug 03 '11 at 02:20