13

Is

if(!test)

faster than

if(test==-1)

I can produce assembly but there is too much assembly produced and I can never locate the particulars I'm after. I was hoping someone just knows the answer. I would guess they are the same unless most CPU architectures have some sort of "compare to zero" short cut.

thanks for any help.

1''
  • 26,823
  • 32
  • 143
  • 200
deanresin
  • 1,466
  • 2
  • 16
  • 31
  • 3
    Generally it will be impossible to measure, but yes, comparing to zero is most basic comparison. A magnitude comparison generally involves a subtraction, then checking the result. A zero comparison avoid the subtraction part. – Cheers and hth. - Alf Mar 17 '14 at 21:49
  • 2
    This level of optimization is not worthwhile. Presently, development time costs more than execution time. A user will appreciate a slower program that works correctly and robust than a fast buggy program. Focus on design, correctness and robustness. Optimize only those sections that warrant optimization by having been profiled. – Thomas Matthews Mar 17 '14 at 22:06
  • 1
    "I can never locate the particulars I'm after". If the toolchain you're using doesn't provide a means to show (dis-)assembly with source, then for the purposes of investigating this kind of question you could switch to one that does. For one example, https://stackoverflow.com/questions/1289881/using-gcc-to-produce-readable-assembly – Steve Jessop Mar 17 '14 at 22:18
  • @thomasmatthews it is worse than that: spending time optimizing that outside of narrow situations will make your program slower, because you could spend that time making performance improvements that **matter** somewhere else. The above level of care **can** matter when in the innermost portion of on the order of billion execution per second loop, but odds are you are not there. And even if you are, concurrency, exploiting coprocessors, memory access, and branches are probably better spots to put your effort than this. – Yakk - Adam Nevraumont Mar 17 '14 at 23:13
  • 1
    @ThomasMatthews "...a slower program that works correctly and robust than a fast buggy program." This is a false dilemma. Maybe if more developers considered performance as they wrote code instead of kicking the can down the road indefinitely, we'd have more software that was fast _and_ worked. There's no downside whatsoever to understanding performance optimization, and I hate that so many people here write comments discouraging this understanding. What's "worthwhile" is not an absolute. It depends on the project. – svadhisthana Aug 31 '20 at 03:31

3 Answers3

14

Typically, yes. In typical processors testing against zero, or testing sign (negative/positive) are simple condition code checks. This means that instructions can be re-ordered to omit a test instruction. In pseudo assembly, consider this:

Loop:
  LOADCC r1, test // load test into register 1, and set condition codes
  BCZS   Loop     // If zero was set, go to Loop

Now consider testing against 1:

Loop:
  LOAD   r1, test // load test into register 1
  SUBT   r1, 1    // Subtract Test instruction, with destination suppressed
  BCNE   Loop     // If not equal to 1, go to Loop

Now for the usual pre-optimization disclaimer: Is your program too slow? Don't optimize, profile it.

Sam Cristall
  • 4,328
  • 17
  • 29
  • 1
    Unfortunately, in many processors, loading a register with a value does not by default, change the condition codes. So the first example will be based upon the prior value of the condition codes. – Thomas Matthews Mar 17 '14 at 21:57
  • 1
    @ThomasMatthews you're correct, I will change it to a LOADCC to show my intent. (My work's DSP platform sets condition codes when you so much as look at it wrong) – Sam Cristall Mar 17 '14 at 21:57
  • I'm aware of the premature optimization rule and profiling. But with this new information I think aiming for a zero test would be good coding practice. I don't think a compiler could optimize a nonzero check to a zero check. – deanresin Mar 17 '14 at 21:58
  • 1
    @deanresin: Let's assume 1 instruction == 100ns time. You are saving 100ns for each time in the loop. So in 10 iteration loop, you save 1us (microsecond). You can loose that time in I/O or some other unoptimized important section. This is why it is a microoptimization. – Thomas Matthews Mar 17 '14 at 22:02
  • @ThomasMatthews At the moment I'm coding a JPG decoder.. in the context of my bottleneck I think it could possibly make a difference especially compounded on other good coding practices. – deanresin Mar 17 '14 at 22:12
  • @deanresin I highly doubt that this micro-optimization will make a difference, even if you are writing a JPG decoder. Did you profile your application? Or what makes you think that this change will make an impact on performance? – Ali Mar 17 '14 at 22:16
  • @deanresin: You may get better performance by cranking up the optimization levels and let the compiler worry about things like this. – Thomas Matthews Mar 17 '14 at 22:17
  • @ThomasMathews I did the math and assuming ~100ns is a correct assumption for an instruction then I am defs micro optimizing. thanks. – deanresin Mar 17 '14 at 22:21
  • 1
    100ns is probably an enormous overestimation of how long a (simple) instruction takes. Unless you're running this code on a toaster, it's probably closer to 1ns (and that's conservative). – harold Mar 17 '14 at 22:27
  • 2
    ~100ns is for a processor that runs at 10MHz. Anything produced in the last dozen years that isn't meant to run a Watch or Toaster will run a fair bit faster than that. Modern x86 processors will run one instruction between 0.5 (2GHz) and 0.25ns (4GHz) - and three or four instructions in parallel on a single core, at that, if the compiler does a decent job. So 100ns gain per loop iteration is VERY unreasonable. – Mats Petersson Mar 17 '14 at 22:27
  • 1
    @harold: Funny, I also mentioned toaster... I wrote mine while you were writing yours, honest. – Mats Petersson Mar 17 '14 at 22:28
  • @Mats Petersson I'm curious why CPU architectures would then bother to short cut a zero compare if it never makes any meaningful performance gains. – deanresin Mar 18 '14 at 03:36
  • Did I say that? First of all, all of the current crop of commercially available CPU architecture have a heritage from the time when CPU's where running at 5-50MHz, and one instruction would take more than one clock-cycle. Second, saving 0.25ns may still make sense in some cases - especially if you can save it in many places in a big loop. But generally, when optimising code is not about saving one instruction on the end of a loop - at least if the loop contains more than a single, simple expression. – Mats Petersson Mar 18 '14 at 06:59
4

It depends.

Of course it's going to depend, not all architectures are equal, not all µarchs are equal, even compilers aren't equal but I'll assume they compile this in a reasonable way.

Let's say the platform is 32bit x86, the assembly might look something like

test eax, eax
jnz skip

Vs:

cmp eax, -1
jnz skip

So what's the difference? Not much. The first snippet takes a byte less. The second snippet might be implemented with an inc to make it shorter, but that would make it destructive so it doesn't always apply, and anyway, it's probably slower (but again it depends).

Take any modern Intel CPU. They do "macro fusion", which means they take a comparison and a branch (subject to some limitations), and fuse them. The comparison becomes essentially free in most cases. The same goes for test. Not inc though, but the inc trick only really applied in the first place because we just happened to compare to -1.

Apart from any "weird effects" (due to changed alignment and whatnot), there should be absolutely no difference on that platform. Not even a small difference.

Even if you got lucky and got the test for free as a result of a previous arithmetic instruction, it still wouldn't be any better.

It'll be different on other platforms, of course.

harold
  • 61,398
  • 6
  • 86
  • 164
3

On x86 there won't be any noticeably difference, unless you are doing some math at the same time (e.g. while(--x) the result of --x will automatically set the condition code, where while(x) ... will necessitate some sort of test on the value in x before we know if it's zero or not.

Many other processors do have a "automatic updates of the condition codes on LOAD or MOVE instructions", which means that checking for "postive", "negative" and "zero" is "free" with every movement of data. Of course, you pay for that by not being able to backward propagate the compare instruction from the branch instruction, so if you have a comparison, the very next instruction MUST be a conditional branch - where an extra instruction between these would possibly help with alleviating any delay in the "result" from such an instruction.

In general, these sort of micro-optimisations are best left to compilers, rather than the user - the compiler will quite often convert for(i = 0; i < 1000; i++) into for(i = 1000-1; i >= 0; i--) if it thinks that makes sense [and the order of the loop isn't important in the compiler's view]. Trying to be clever with these sort of things tend to make the code unreadable, and performance can suffer badly on other systems (because when you start tweaking "natural" code to "unnatural", the compiler tends to think that you really meant what you wrote, and not optimise it the same way as the "natural" version).

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Here is an example where being clever was better than the compiler [produce-loops-without-cmp-instruction-in-gcc](http://stackoverflow.com/questions/25921612/produce-loops-without-cmp-instruction-in-gcc). – Z boson Oct 09 '14 at 06:06
  • @Zboson Actually, that looks like something the compiler should be able to figure out for itself. – Mats Petersson Oct 09 '14 at 06:27
  • I agree the compiler _should_ figure it out but in this case it did not. – Z boson Oct 09 '14 at 07:09
  • I'd say report that as a bug - gcc guys typically fix those things sooner or later, but as a developer I know how hard it is to find EVERY case of "Oh, look, it doesn't do the right thing in this case". – Mats Petersson Oct 09 '14 at 07:20