Finding which code segment is faster than the other

Question

Say that we have two C++ code segments, for doing the same task. How can we determine which code will run faster?

As an example lets say there is this global array "some_struct_type numbers[]". Inside a function, I can read a location of this array in two ways(I do not want to alter the content of the array)

some_struct_type val = numbers[i];
some_struct_type* val = &numbers[i]

I assume the second one is faster. but I can't measure the time to make sure because it will be a negligible difference.

So in this type of a situation, how do I figure out which code segment runs faster? Is there a way to compile a single line of code or set of lines and view how many lines of assembly instructions are there?

I would appreciate your thoughts on this matter.

@EugeneSh. Sh Yes I know. But at the end I can use val.variable or val->variable to read the values in the structure. This was an obvious example I just used to explain my question. — user2389323, Nov 13 '17 at 22:55
"I can't measure the time to make sure because it will be a negligible difference" -> Then you shouldn't optimize that. Who cares, if the difference is not measurable? — geza, Nov 13 '17 at 22:55
@geza. true. but if i add 1000 such lines then there will be notable difference. but then i have to change 1000 lines. — user2389323, Nov 13 '17 at 22:58
@user2389323: you can easily switch between versions with search and replace. And, if you have 1000 such lines, and then there is a measurable difference, then *boom*, you know the answer, which is faster :) — geza, Nov 13 '17 at 23:02
@user2389323: "Is there a way to compile a single line of code or set of lines and view how many lines of assembly instructions are there?" No. The basic compile unit is usually a function. So you can compile a function, and then you can identify which asm instructions is for a specific part of code (asm instructions can be reordered by the compiler, calculations can have interleaved instructions, etc.) But don't judge its speed by just looking the asm instruction count. It is more complicated than that. — geza, Nov 13 '17 at 23:05
You are doing this at the wrong level. Accessing a single variable is in the range of nanoseconds. That's 0.000000001 seconds, or less. Even if you do this a 1000 times, or a million times, the difference is still so small that it is difficult to measure. So why bother? — Bo Persson, Nov 13 '17 at 23:36

alain · Answer 1 · 2017-11-13T23:29:11.253

The basics are to run the piece of code so many times that it takes a few seconds at least to complete, and measure the time.

But it's hard, very hard, to get any meaningful figures this way, for many reasons:

Todays compilers are very good at optimizing code, but the optimizations depend on the context. It often does not make sense to look at a single line and try to optimize it. When the same line appears in a different context, the optimizations applied may be different.
Short pieces of code can be much faster than the surrounding looping code.
Not only the compiler makes optimizations, the processor has a cache, an instruction pipeline, and tries to predict branching code. A value which has been read before will be read much faster the next time, for example.
...

Because of this, it's usually better to leave the code in its place in your program, and use a profiling tool to see which parts of your code use the most processing resources. Then, you can change these parts and profile again.

While writing new code, prefer readable code to seemingly optimal code. Choose the right algorithm, this also depends on your input sizes. For example, insertion sort can be faster than quicksort, if the input is very small. But don't write your own sorting code, if your input is not special, use the libraries available in general. And don't optimize prematurely.

There's a two-step process. First, when you write the code, use good engineering judgment to pick efficient algorithms and write clean, maintainable, code. Don't do stupid things, but don't make the code ugly with micro-optimizations that might not even turn out to improve performance. Second, if you do have performance issues, profile the code to figure out where the obstacles are and use the information gained from the profile to optimize them. (But don't try to benchmark lines of code that you haven't even put into functions yet.) — David Schwartz, Nov 13 '17 at 23:08
Thanks for the comment, @DavidSchwartz. I tried to add a bit of it to my answer. — alain, Nov 13 '17 at 23:21

Ben S. · Accepted Answer · 2017-11-13T23:08:51.880

Eugene Sh. is correct that these two lines aren't doing the same thing - the first one copies the value of numbers[i] into a local variable, whereas the second one stores the address of numbers[i] into a pointer local variable. If you can do what you need using just the address of numbers[i] and referring back to numbers[i], it's likely that will be faster than doing a wholesale copy of the value, although it depends on a lot of factors like the size of the struct, etc.

Regarding the general optimization question, here are some things to consider...

Use a Profiler

The best way to measure the speed of your code is to use a profiling tool. There are a number of different tools available, depending on your target platform - see (for example) How can I profile C++ code running in Linux? and What's the best free C++ profiler for Windows?.

You really want to use a profiler for this because it's notoriously difficult to tell just from looking what the costliest parts of a program will be, for a number of reasons...

# of Instructions != # of Processor Cycles

One reason to use a profiler is that it's often difficult to tell from looking at two pieces of code which one will run faster. Even in assembly code, you can't simply count the number of instructions, because many instructions take multiple processor cycles to complete. This varies considerably by target platform. For example, on some platforms the fastest way to load the value 1 to a CPU register is something straightforward like this:

MOV r0, #1

Whereas on other platforms the fastest approach is actually to clear the register and then increment it, like this:

CLR r0
INC r0

The second case has more instruction lines, but that doesn't necessarily mean that it's slower.

Other Complications

Another reason that it's difficult to tell which pieces of code will most need optimizing is that most modern computers employ fairly sophisticated caches that can dramatically improve performance. Executing a cached loop several times is often less expensive than loading a single piece of data from a location that isn't cached. It can be very difficult to predict exactly what will cause a cache miss, but when using a profiler you don't have to predict - it makes the measurements for you.

Avoid Premature Optimization

For most projects, optimizing your code is best left until relatively late in the process. If you start optimizing too early, you may find that you spend a lot of time optimizing a feature that turns out to be relatively inexpensive compared to your program's other features. That said, there are some notable counterexamples - if you're building a large-scale database tool you might reasonably expect that performance is going to be an important selling point.

Finding which code segment is faster than the other

2 Answers2

Use a Profiler

# of Instructions != # of Processor Cycles

Other Complications

Avoid Premature Optimization