What is faster: compare then change, or change immediately?

Question

Let I'm doing very fast loops and I have to be sure that in the end of each loop the variable a is SOMEVALUE. What will be faster?

if (a != SOMEVALUE) a = SOMEVALUE;

or just instantly do

a = SOMEVALUE;

Is it float/int/bool/language specific?

Update: a is a primitive type, not a class. And the possibility of TRUE comparison is 50%. I know that the algorithm is what makes a loop fast, so my question is also about the coding style.

Update2: thanks everyone for quick answers!

In C++, if `a`'s type is a class with cheap test and expensive `operator=()`, testing first is better. Most of the time, however, just assigning is better or at least not worse. — Daniel Fischer, Jul 19 '12 at 14:43
Edge case: the two don't necessarily have the same meaning. Consider if `a` has type `float` and holds a negative zero, and `SOMEVALUE` is `0`. Then `a = SOMEVALUE;` sets `a` to `0`, and `if (a != SOMEVALUE) a = SOMEVALUE` doesn't. The same is (more obviously) true of user-defined types, where `operator!=` could do anything. — Steve Jessop, Jul 19 '12 at 14:45
This kind of thing is unanswerable at the C++ level. It will depend entirely on the underlying hardware and the instruction set it supports (assuming the compiler uses the hardware optimally). The only way to know is to actually time and see. — Martin York, Jul 19 '12 at 15:03
@XaFromRussion - Which machine will your code run on? If you *know* this, and really just care about getting the best performance on *that* hardware, add the tag, and mention it in your question. — ArjunShankar, Jul 19 '12 at 15:05
Before you care about this kind of micro optimization you should know that it will make an affect on your program. Write the code in the cleanest (most readable) way possible. It it is slow then find out where it is slow and optimize that section. IMO (and I would test to verify) Unless this code is at the bottom of a very tight loop that is happening trillions of times I would not even look at it for optimization. Otherwise any gains will be so small that they will be lost in other affects. — Martin York, Jul 19 '12 at 15:18

score 2 · Answer 1 · answered Jul 19 '12 at 14:46

In almost all cases just setting the value will be faster.

It might not be faster when you have to deal with cache line sharing with other cpus or if 'a' is in some special type of memory, but it's safe to assume that a branch misprediction is probably a more common problem than cache sharing.

Also - smaller code is better, not just for the cache but also for making the code comprehensible.

If in doubt - profile.

score 2 · Answer 2 · edited May 23 '17 at 12:12

The general answer is to profile such kind of questions. However, in this case a simple analysis is available:

Each test is a branch. Each branch incurs a slight performance penalty. However, we have branch prediction and this penalty is somewhat amortized in time, depending how many iterations your loop has and how many times the prediction was correct.

Translated into your case, if you have many changes to a during the loop it is very likely that the code using if will be worse in performance. On the other hand, if the value is updated very rarely there would be an infinitely small difference between the two cases.

Still, change immediately is better and should be used, as long as you don't care about the previous value, as your snippets show.

Other reasons for an immediate change: it leads to smaller code thus better cache locality, thus better code performance. It is a very rare situation in which updating a will invalidate a cache line and incur a performance hit. Still, if I remember correctly, this will byte you only on multi processor cases and very rarely.

Keep in mind that there are cases when the two are not similar. Comparing NaNs is undefined behaviour.

Also, this comment treats only the case of C. In C++ you can have classes where the assignment operator / copy constructor takes longer than testing for equality. In that case, you might want to test first.

Taking into account your update, it's better to simply use assignment as long as you're sure of not dealing with undefined behaviour (floats). Coding-style wise it is also better, easier to read.

MByD · Answer 3 · 2012-07-19T15:54:24.720

1

~~Change immediately is usually faster, as it involves no branch in the code.~~

As commented below and answered by others, it really depends on many variables, but IMHO the real question is: do you care what was the previous value? If you are, you should check, otherwise, you shouldn't.

edited Jul 19 '12 at 15:54

answered Jul 19 '12 at 14:43

MByD

135,866
28
264
277

2

You are making assumptions with `as it involves no branch`. Some cpu support single instruction test and set. You also forget predictive branching etc. there is not way to gut feel these things with modern computers the only way is to run and time it. – Martin York Jul 19 '12 at 15:14

score 1 · Answer 4 · answered Jul 19 '12 at 14:44

1

You should profile it.

My guess would be that there is little difference, depending on how often the test is true (this is due to branch-prediction).

Of course, just setting it has the smallest absolute code size, which frees up instruction cache for more interesting code.

But, again, you should profile it.

answered Jul 19 '12 at 14:44

unwind

391,730
64
469
606

1

This. Modern processors and compilers are too unpredictable to rely on gut feelings or even common sense. – Mark Ransom Jul 19 '12 at 14:50
+1 Some CPU have a single instruction for test and set. So instruction count may not be an issue. – Martin York Jul 19 '12 at 15:00

score 1 · Answer 5 · answered Jul 19 '12 at 14:46

I would be surprised is the answer wasn't a = somevalue, but there is no generic answer to this question. Firslty it depends on the speed of copy versus the speed of equality comparison. If the equality comparison is very fast then your first option may be better. Secondly, as always, it depends on your compiler/platform. The only way to answer such questions is to try both methods and time them.

score 1 · Answer 6 · answered Jul 19 '12 at 14:47

As others have said, profiling it is going to be the easiest way to tell as it depends a lot on what kind of input you're throwing at it. However, if you think about the computational complexity of the two algorithms, the more input you throw at it, the smaller any possible difference of them becomes.

score 1 · Answer 7 · answered Jul 19 '12 at 14:50

As you are asking this for a C++ program, I assume that you are compiling the code into native machine instructions.

Assigning the value directly without any comparison should be much faster in any case. To compare the values, both the values a and SOMEVALUE should be transferred to registers and one machine instruction cmp() has to be executed.

But in the later case where you assign directly, you just move one value from one memory location to another.

Only way the assignment can be slower is when memory writes are significantly costlier than memory reads. I don't see that happening.

score 1 · Answer 8 · answered Jul 19 '12 at 14:51

Profile the code. Change accordingly.

For basic types, the no branch option should be faster. MSVS for example doesn't optimize the branch out.

That being said, here's an example of where the comparison version is faster:

struct X
{
    bool comparisonDone;
    X() : comparisonDone(false) {}
    bool operator != (const X& other) { comparisonDone = true; return true; }
    X& operator = (const X& other)
    {
       if ( !comparisonDone )
       {
           for ( int i = 0 ; i < 1000000 ; i++ )
               cout << i;
       }
       return *this;
    }
}

int main()
{
    X a;
    X SOMEVALUE;
    if (a != SOMEVALUE) a = SOMEVALUE;
    a = SOMEVALUE;
}

ArjunShankar · Answer 9 · 2012-07-19T17:02:44.690

0

That if can actually be 'optimized away' by some compilers, basically turning the if into code noise (for the programmer who's reading it).

When I compile the following function with GCC for x86 (with -O1, which is a pretty reasonable optimization level):

int foo (int a)
{
  int b;

  if (b != a)
    b = a;

  b += 5;

  return b;
}

GCC just 'optimizes' the if and the assignment away, and simply uses the argument to do the addition:

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        popl    %ebp
        addl    $5, %eax
        ret
        .ident  "GCC: (GNU) 4.4.3"

Having or not having the if generates exact the same code.

edited Jul 19 '12 at 17:02

answered Jul 19 '12 at 14:53

ArjunShankar

23,020
5
61
83

1

Branches are cheap if you go the correct way (most modern CPU do branch predication and execute code before they know the result of the expression). Assignment is really expensive (if it needs to be placed into memory). So your initial assumptions can be true/false depending. Also you make an assumption on the need for a branch based on your hardware without knowing the OP hardware. Some CPU instruction sets support a test and set in a single instruction. – Martin York Jul 19 '12 at 15:07
While what you say is correct, the point of my answer was to show what usually happens a rather widely used CPU on desktops, as an example. Mispredictions by the branch predictor are costly too, but we're trying to answer a question that doesn't even show the loop involved. – ArjunShankar Jul 19 '12 at 16:48
The reason I wrote the answer is that nobody else showed that the compiler may even optimize the `if` away (which basically makes both versions identically fast). – ArjunShankar Jul 19 '12 at 16:53
After thinking again, I decided that I really want this answer to concentrate on the fact that the `if` can get optimized away. I will make a corresponding edit. @LokiAstari - thanks for the poke. – ArjunShankar Jul 19 '12 at 16:58

What is faster: compare then change, or change immediately?

9 Answers9