What is faster (x < 0) or (x == -1)?

Question

Variable x is int with possible values: -1, 0, 1, 2, 3. Which expression will be faster (in CPU ticks):

1. (x < 0)
2. (x == -1)

Language: C/C++, but I suppose all other languages will have the same.

P.S. I personally think that answer is (x < 0).

More widely for gurus: what if x from -1 to 2^30?

To answer in such low level considerations, the CPU architecture would be a minimal piece of info, don't you think? But even then, a CPU which needs different amount of cycles for those conditions would be pretty primitive. — joelr, May 26 '09 at 19:32
On any reasonably modern CPU (within the last decade or so), I would be surprised if any 32-bit integer compare took more than one cycle. — Eddie, May 26 '09 at 19:36
Why is this a bad question? A thorough answer of that leaves all associated with a much better understanding of how processors work and stuff like that. Isn't that a good thing? — Alex Gartrell, May 26 '09 at 21:17
If you are optimizing to such low level, you are doing it wrong. The only language that make sense to optimize such operation it is in assembler and you should have a good reason. — Ismael, May 26 '09 at 21:29
With today's processor architectures and compilers, assembler might not be any faster. The input to the compiler defines the constraints it applies to the resulting instructions, and different constraints may lead to different performing code. — Mark Ransom, May 26 '09 at 21:45
One last point: there's no way to generalize an answer to a question like this. The best approach is to try it both ways, with your prooduction compiler and a representative test system, and compare the results. I'm surprised at how often this sort of question comes up, when a couple of minutes of benchmarking could provide the answer. — Mark Ransom, May 26 '09 at 21:48
I believe he is asking just to know better. Optimizing this is stupid. I'm actually quite intrigued myself since I have no idea. +1 from me :) — the_drow, May 27 '09 at 05:25
@Ismael: Sounds like you have never worked on embedded code. — Ed S., May 27 '09 at 07:48
Are you also comparing to 0, 1, 2, 3 in the same place ? Because switch/case would then be an option. — Jem, May 28 '09 at 00:54
Can you give us any of the surrounding code? There may be non trivial ways to optimize the code. — jabbie, May 28 '09 at 00:58
Yeah, show the code! There are other ways to check as well. You can add one and check against zero. You can do a logical and with 4. And if you go to assembly, there are all kinds of possible shortcuts. But no way to tell what will work best with what you've posted. — Nosredna, May 28 '09 at 01:13
This is a superficially stupid optimization. Your claim to understand your algorithm and to have profiled (making this a valid optimization) yet you never thought to look at the assembly code? — Michael Carman, May 28 '09 at 14:11
possible duplicate of [Is < faster than <=?](http://stackoverflow.com/questions/12135518/is-faster-than) — nawfal, Jul 17 '14 at 11:05

score 80 · Accepted Answer · answered May 26 '09 at 19:37

80

That depends entirely on the ISA you're compiling for, and the quality of your compiler's optimizer. Don't optimize prematurely: profile first to find your bottlenecks.

That said, in x86, you'll find that both are equally fast in most cases. In both cases, you'll have a comparison (cmp) and a conditional jump (jCC) instructions. However, for (x < 0), there may be some instances where the compiler can elide the cmp instruction, speeding up your code by one whole cycle.

Specifically, if the value x is stored in a register and was recently the result of an arithmetic operation (such as add, or sub, but there are many more possibilities) that sets the sign flag SF in the EFLAGS register, then there's no need for the cmp instruction, and the compiler can emit just a js instruction. There's no simple jCC instruction that jumps when the input was -1.

answered May 26 '09 at 19:37

Adam Rosenfield

390,455
97
512
589

21

I don't believe that this is or was the "bottleneck" in any program. If you saw a difference in time it's more likely that you code "jumped" over the == -1 condition by e.g. setting it to -2 and thus did not terminate the loop (assuming that expression was part of a loop). – lothar May 26 '09 at 20:00
5

Don't forget that the cmp instruction might be replaced by an or instruction, which wouldn't reduce the number of cycles but might change the memory alignment. This might be helpful, or it might be counterproductive, which is why profiling is so important. – Mark Ransom May 26 '09 at 21:35
17

P.S. Don't look down on this question - I've had loops that were so tight that this kind of optimization would make a difference. Usually only a couple of percent, but every little bit helps sometimes! – Mark Ransom May 26 '09 at 21:38
On x86, TEST may be used to test reg == 0, and it's expected to be faster than CMP. – Bastien Léonard May 28 '09 at 19:55
1

And not even depends in the ISA alone, but in the actual implementation of the architecture too... – fortran Jun 02 '09 at 20:14
@BastienLéonard *Citation needed*. – Jonathon Reinhart Aug 14 '14 at 17:32
No citation needed, pipelining is different across CPU vendors hence same instructions in x86 require different cpu cycles count on 2 different CPUs. – CoffeDeveloper Aug 20 '15 at 12:07

Alex Feinman · Answer 2 · 2013-05-31T17:23:49.327

11

Try it and see! Do a million, or better, a billion of each and time them. I bet there is no statistical significance in your results, but who knows -- maybe on your platform and compiler, you might find a result.

This is a great experiment to convince yourself that premature optimization is probably not worth your time--and may well be "the root of all evil--at least in programming".

edited May 31 '13 at 17:23

answered May 26 '09 at 19:34

Alex Feinman

5,393
1
30
48

5

This is advice, but not really an answer. – Paul Draper Aug 06 '14 at 22:31

score 9 · Answer 3 · answered May 26 '09 at 19:27

9

Both operations can be done in a single CPU step, so they should be the same performance wise.

answered May 26 '09 at 19:27

Gavin H

10,274
3
35
42

11

Arrrghh! While this is true on the vast majority of chips, you simple *can't* make a definitive statement without knowing the platform he's working on. All the world is not an x86. – dmckee --- ex-moderator kitten May 26 '09 at 20:01
4

Well I would assume if he were asking this question for a specific, non-normal architecture he would specify as such. If he is asking in general, I was trying to give a simple answer for most modern architectures. – Gavin H May 26 '09 at 20:34
Sure, I didn't think about any specific architecture. Usual x86. – Nikolay Vyahhi May 26 '09 at 21:51

score 9 · Answer 4 · answered May 27 '09 at 04:10

9

x < 0 will be faster. If nothing else, it prevents fetching the constant -1 as an operand. Most architectures have special instructions for comparing against zero, so that will help too.

answered May 27 '09 at 04:10

Phone Guy

283
2
3
5

2

How can you tell this, without knowing architecture and/or compiler? – Johan Kotlinski May 27 '09 at 07:46
1

Which architecture are you talking about? I believe most x86 instruction sets can do a compare against an immediate value. No need to fetch an operand. Here is a link to an intel instruction set reference: http://www.intel.com/Assets/PDF/manual/253666.pdf – jabbie May 28 '09 at 01:40
Sure, almost any architecture can do a compare against an immediate value. But even there the instruction is bigger (and therefore requires another fetch from memory). Not a huge deal, unless every ounce of performance is critical, which seemed to be the context here. I assume the questioner is writing a device driver or something. – Phone Guy May 29 '09 at 03:46
As to the first question - I've looked at architectures for a long time. After the first half dozen or so, patterns start to emerge. I also happen to know more than is healthy about the semantics of the x86 instruction set, which most folks tend to focus on these days. For instance, any time you do pretty much anything with a value on x86, the condition bits are set. So you can test for negative with a JB instruction after doing a computation, loading a value into a register, etc. Compilers generally try to take advantage of this, though some dumb ones don't. – Phone Guy May 29 '09 at 03:52

Kristopher Johnson · Answer 5 · 2009-06-02T20:10:02.810

It could be dependent on what operations precede or succeed the comparison. For example, if you assign a value to x just before doing the comparison, then it might be faster to check the sign flag than to compare to a specific value. Or the CPU's branch-prediction performance could be affected by which comparison you choose.

But, as others have said, this is dependent upon CPU architecture, memory architecture, compiler, and a lot of other things, so there is no general answer.

score 3 · Answer 6 · answered May 26 '09 at 19:32

The important consideration, anyway, is which actually directs your program flow accurately, and which just happens to produce the same result?

If x is actually and index or a value in an enum, then will -1 always be what you want, or will any negative value work? Right now, -1 is the only negative, but that could change.

score 3 · Answer 7 · answered May 26 '09 at 19:34

3

You can't even answer this question out of context. If you try for a trivial microbenchmark, it's entirely possible that the optimizer will waft your code into the ether:

// Get time
int x = -1;
for (int i = 0; i < ONE_JILLION; i++) {
    int dummy = (x < 0); // Poof!  Dummy is ignored.
}
// Compute time difference - in the presence of good optimization
// expect this time difference to be close to useless.

answered May 26 '09 at 19:34

Bob Cross

22,116
12
58
95

It will be optimized by compiler into zero instructions. But I understood your idea, thanks. – Nikolay Vyahhi May 26 '09 at 19:47
Yes - that's what I was trying to say in a jolly way. If it wasn't clear on the first try, my fault. – Bob Cross May 26 '09 at 23:36
You can avoid this to an extent by allowing x and dummy to escape (ie, pass their pointers to a function in another translation unit) and introducing a compiler-specific memory barrier instruction such as gcc's __sync_synchronize(). This will force the compiler to emit code to evaluate (x<0) and set dummy - but it will also force memory accesses. – bdonlan May 27 '09 at 04:17
2

In the end, you're going to end up creating an elaborate construction to try to measure a difference that isn't there or isn't measurable without 100% context. For example, the OP tagged this question with "C++" and "C" - there's a dramatic difference between the two, much less between the various compilers on all the different platforms. – Bob Cross May 27 '09 at 04:53
In such a small piece of code adding measurement code could change the result because of caching, optimization and such. – Makis May 28 '09 at 14:06
@Makis, yes, that's exactly what I'm saying. – Bob Cross May 28 '09 at 16:12

zxcat · Answer 8 · 2009-09-08T20:22:49.053

Nikolay, you write:

It's actually bottleneck operator in the high-load program. Performance in this 1-2 strings is much more valuable than readability...

All bottlenecks are usually this small, even in perfect design with perfect algorithms (though there is no such). I do high-load DNA processing and know my field and my algorithms quite well

If so, why not to do next:

get timer, set it to 0;
compile your high-load program with (x < 0);
start your program and timer;
on program end look at the timer and remember result1.
same as 1;
compile your high-load program with (x == -1);
same as 3;
on program end look at the timer and remember result2.
compare result1 and result2.

You'll get the Answer.

score 1 · Answer 9 · answered May 26 '09 at 19:27

1

Same, both operations are usually done in 1 clock.

answered May 26 '09 at 19:27

Artyom

31,019
21
127
215

1

It would be one instruction fetch-and-execute cycle, but not one clock cycle. – raimue May 28 '09 at 01:16

score 1 · Answer 10 · answered May 27 '09 at 09:01

1

It depends on the architecture, but the x == -1 is more error-prone. x < 0 is the way to go.

answered May 27 '09 at 09:01

1

No this is not the way to go. To detect errors, use unit tests, not fancy code. To be less error prone: give a name to constants. It's usually better to go straight to the point. If the goal is to compare to -1, just write (x == -1), otherwise the next developer maintaining this code will have to figure out why we are comparing to 0 ("oh, ok, it's in fact to test against -1"), and then figure out what (the f...) is -1. – Jem May 28 '09 at 00:52
Well, we are talking about an ideal case. As you say, nobody should use "magic numbers", but constants. You can compare to ( x <= VALUE ) this way. Usually you do this with counter variables, so it's a good way to be less error prone. In real world, unit test not always can be done (time or other constraints). Obviously if it's a special case you ONLY want to check the '-1' value, ( x == VALUE ) it's the way to go. – Jun 23 '09 at 08:36

score 1 · Answer 11 · answered May 28 '09 at 16:41

As others have said there probably isn't any difference. Comparisons are such fundamental operations in a CPU that chip designers want to make them as fast as possible.

But there is something else you could consider. Analyze the frequencies of each value and have the comparisons in that order. This could save you quite a few cycles. Of course you still need to compile your code to asm to verify this.

score 1 · Answer 12 · answered May 28 '09 at 19:42

I'm sure you're confident this is a real time-taker.

I would suppose asking the machine would give a more reliable answer than any of us could give.

I've found, even in code like you're talking about, my supposition that I knew where the time was going was not quite correct. For example, if this is in an inner loop, if there is any sort of function call, even an invisible one inserted by the compiler, the cost of that call will dominate by far.

What is faster (x < 0) or (x == -1)?

12 Answers12

Linked