Faster comparing signed than unsigned ints

Question

Possible Duplicate:
performance of unsigned vs signed integers

I have read somewhere that it's a tiny bit faster on x86_64 to compare signed ints in C/C++ compared to unsigned ints, e.g. for (int i...) is "faster" than for (uint i...).

Is that true? Why is that true? I know the difference is super small, but anyway.

@Chris: Only partially, I am interested not only in division. — Cartesius00, Jan 09 '12 at 17:03
But the discussion in many of the other answers and comments covers ground other than division. Worth a read of all answers to get a full picture as I think this is already covered there. In particular see http://stackoverflow.com/a/4712784/130352 for why unsigned in a for loop may be faster (for a probably very small value of faster). — Chris J, Jan 09 '12 at 17:06
That previous comment of mine should say "for why *signed* in a for loop may be faster.". (SO's not letting me edit it as the grace period for editing comments has elapsed). — Chris J, Jan 09 '12 at 17:12
There is no such thing as a small difference in such microoptizations. Either it takes one instruction and one clock cycle, or it takes two or more. The key to optimization is measurement (profiling), to ensure that every change has a significant effect. — Potatoswatter, Jan 09 '12 at 17:22

Ben Voigt · Accepted Answer · 2012-01-09T17:16:25.487

You'd better cite a source for such a claim, which is ridiculous on the face of it.

We're talking about x86_64, which means modern processors. These ALUs will complete integer addition, subtraction, and/or comparison in a single clock cycle (cache misses will of course take longer, but depend only on the size and memory layout of the data, not signed-ness). Or even less, with the SIMD coprocessor.

I'd more likely to believe that there's a slight power difference between the two types of comparison, but not a speed difference.

Now, it is possible that for a particular compiler, code generation is worse for one data type vs the other when targeting the x86_64 platform. But that would be a very specialized case, and not likely to apply to all x86_64 compilers. And still, I'd suspect cache effects or background processes were affecting the performance measurement (even performance counters that measure time spent per-process will be affected by a context switch invalidating caches).

I can see at least three kinds of optimizations which may be influenced (one way or the other) by the signed/unsigned difference: taking into account additional result of previous computation, strength reduction (replacing a division by a shift for instance, but I don't see such effect with comparisons), value range propagation (determining the possible value range and using it to make the code simpler). Every one of them is quite context dependant. — AProgrammer, Jan 09 '12 at 17:18

Dietrich Epp · Answer 2 · 2012-01-09T17:16:47.343

Well, there's no good answer here. The speed differences are usually going to so be marginal that you get better performance by spending your time thinking about other things. But there are a few weird cases, since signed overflow is undefined. For example, compare these two:

for (int i = 0; condition(); ++i) {
    if (i == 0) {
        computation();
    }
}

for (unsigned i = 0; condition(); ++i) {
    if (i == 0) {
        computation();
    }
}

A conformant compiler can move computation outside the loop that uses a signed index because it can assume that i == 0 once and only once -- because signed overflow is undefined behavior (overflow could terminate the program, or wrap around, or make demons fly out your nose). However, the compiler cannot move computation outside the second loop without significantly more work (unsigned integers always wrap around when they overflow).

However, on x86_64 you often have to sign-extend an int before working with it. This takes an additional instruction.

Conclusion: It's not really important. Whoever told you that one is faster than the other is distracting you from being productive.

While this is an interesting corner case, I still don't think it's the *comparison* that is faster. — Ben Voigt, Jan 09 '12 at 17:19

score 1 · Answer 3 · answered Jan 09 '12 at 17:09

1

Maybe in some cases, signed integers force the compiler to sign-extend them.
If you move an integer to eax, then the high bits of rax are set to zeros (the low bits are the bits of eax). If it's an integer, you need to set the high 32 bits to the sign of the low 32 bits. It's one extra instruction.

I'm not sure this sign extension is needed in a simple for (i=0; i<MAX; i++).

answered Jan 09 '12 at 17:09

ugoren

16,023
3
35
65

Not true. There's a single instruction for all: zero-extension (`movzx`), sign-extension (`movsx`) and simple `mov`. – Yakov Galka Jan 09 '12 at 17:15
This is an argument for `int_fast32_t` vs `int`, if anything. – Ben Voigt Jan 09 '12 at 17:15
It depends where the data came from. If you read 32 bits from memory, you can indeed use `movsx`. But if you have the value in a 32bit register, you need to sign extend first. Bottom line, it depends. – ugoren Jan 09 '12 at 18:46

AProgrammer · Answer 4 · 2012-01-09T17:29:42.363

Measure

Measurement on the platform has always been the only method of answering those kind of questions (and x86_64 is just a starting point for determining the platform, the precise model may be needed). Nowadays, the optimizations made on chip (look out what "superscalar out of order processor with speculative execution" means) and by the compilers make those kind of differences (assuming they exist, which I don't really for signed/unsigned comparisons, but optimizations like value range propagation and strength reduction may have an influence) so context dependant that a measurement in the desired context is needed as it will have an influence on the result (I've had code where putting some integer variables as double make it performs better on some machines, worse on others).

score 0 · Answer 5 · answered Jan 09 '12 at 17:18

There should be no difference in speed. The comparison instruction (CMP) on x86 32/64 is the same on both signed and unsigned integer data types. All of the branching instructions (jxx) and conditional moves (cmovxx) work by testing the CPU flags changed by CMP. Increments, decrements, addictions, subtractions on integer data are also not aware of signed or unsigned data types (due to the 2 complement).

Faster comparing signed than unsigned ints

5 Answers5

Linked