90

Is there any performance gain/loss by using unsigned integers over signed integers?

If so, does this goes for short and long as well?

Flexo
  • 2,506
  • 3
  • 21
  • 32
  • 18
    @JeremyP, might I suggest that you spoke the truth only for the majority of developers and applications.... – Brett Nov 11 '12 at 21:31
  • 1
    @Brett: The difference between signed and unsigned arithmetic on most CPUs is zero. The difference for various sizes is marginal unless you are doing a lot of arithmetic. – JeremyP Nov 12 '12 at 08:03

12 Answers12

118

Division by powers of 2 is faster with unsigned int, because it can be optimized into a single shift instruction. With signed int, it usually requires more machine instructions, because division rounds towards zero, but shifting to the right rounds down. Example:

int foo(int x, unsigned y)
{
    x /= 8;
    y /= 8;
    return x + y;
}

Here is the relevant x part (signed division):

movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl $3, %eax

And here is the relevant y part (unsigned division):

movl 12(%ebp), %edx
shrl $3, %edx
fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • 12
    This will only work in case when the divisor is a complile-time known constant being a power of two, won't it? – sharptooth Jan 17 '11 at 11:31
  • 1
    @sharptooth, for division, yes. There are probably other bit manipulations tricks which are valid only for unsigned. Or signed. I don't think the positive effect is only in one direction. – AProgrammer Jan 17 '11 at 12:06
  • Why the trick can't be done for non-constant divisors? The first operand of x86 `shrl` should be a literal? – Manu343726 Dec 30 '13 at 11:57
  • @Manu343726 What if the divisor is not a power of 2? (And even if it was, you would first have to calculate the binary logarithm of the number before shifting.) – fredoverflow Oct 04 '14 at 16:08
  • 2
    On this scale, *more instructions* doesn't always mean *slower in runtime* for modern pipelined CPU architectures. I.e. I'd still do a measurement before jumping at far-reaching conclusions. – ulidtko Jun 13 '16 at 07:22
55

In C++ (and C), signed integer overflow is undefined, whereas unsigned integer overflow is defined to wrap around. Notice that e.g. in gcc, you can use the -fwrapv flag to make signed overflow defined (to wrap around).

Undefined signed integer overflow allows the compiler to assume that overflows don't happen, which may introduce optimization opportunities. See e.g. this blog post for discussion.

kbjorklu
  • 1,338
  • 8
  • 10
  • 3
    Note that this is not the case since C++ 20. – Seideun Mar 16 '22 at 07:32
  • 1
    @Seideun: It's more complicated than that. Signed integer overflow on arithmetic is still undefined, so `INT_MAX + 1` is undefined behavior, as is `INT_MIN % -1`. What has changed is that `signed_integer_type(any_integer_value)` is now defined for any non-indeterminate value, and guaranteed to give you the two's complement result. – David Stone Jul 16 '22 at 20:04
25

unsigned leads to the same or better performance than signed. Some examples:

  • Division by a constant which is a power of 2 (see also the answer from FredOverflow)
  • Division by a constant number (for example, my compiler implements division by 13 using 2 asm instructions for unsigned, and 6 instructions for signed)
  • Checking whether a number is even (i have no idea why my MS Visual Studio compiler implements it with 4 instructions for signed numbers; gcc does it with 1 instruction, just like in the unsigned case)

short usually leads to the same or worse performance than int (assuming sizeof(short) < sizeof(int)). Performance degradation happens when you assign a result of an arithmetic operation (which is usually int, never short) to a variable of type short, which is stored in the processor's register (which is also of type int). All the conversions from short to int take time and are annoying.

Note: some DSPs have fast multiplication instructions for the signed short type; in this specific case short is faster than int.

As for the difference between int and long, i can only guess (i am not familiar with 64-bit architectures). Of course, if int and long have the same size (on 32-bit platforms), their performance is also the same.


A very important addition, pointed out by several people:

What really matters for most applications is the memory footprint and utilized bandwidth. You should use the smallest necessary integers (short, maybe even signed/unsigned char) for large arrays.

This will give better performance, but the gain is nonlinear (i.e. not by a factor of 2 or 4) and somewhat unpredictable - it depends on cache size and the relationship between calculations and memory transfers in your application.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
  • 10
    I would be careful with the statement about the performance of short compared to int. While arithmetic "might" be faster using int , one should remember that integer arithmetic is rarely a bottleneck (on modern desktop cpus at least), memory bandwidth on the other hand often is, so for large datasets short might actually give considerably better performance then int. Furthermore, for autovectorized code using smaller datatypes often means that more data elements can be prcoessed at one, so even arithmetic performance might increase (though unlikely given the current state of autovectorizers). – Grizzly Jan 18 '11 at 21:26
  • 1
    @Grizzly I agree (my app is actually computation-heavy, so my experience with `short` is different than yours/anyone else's) – anatolyg Jan 19 '11 at 16:59
  • @Grizzly Does this also mean that with short, CPU cache can be used more optimally because it will be able to hold more data? – martinkunev Jun 30 '14 at 13:38
  • 2
    @martinkunev Absolutely! This may be the only reason to use `short` today (with non-cache RAM being effectively infinite), and a very good reason. – anatolyg Jun 30 '14 at 18:55
  • 1
    @anatolyg RAM may be effectively infinite, but don't forget that 32-bit programs still outnumber 64-bit ones by a large margin, which means regardless of how much RAM is available, you're still often limited to 2GB of usable address-space. – bcrist Sep 06 '14 at 23:25
  • 3
    @JoshParnell I guess you mean `short` is faster than `int` when *memory bound*. In my experience is they have the same performance on x86, and `short` is slower on ARM. – anatolyg May 10 '17 at 08:51
  • @JoshParnell Regarding MMX - I guess you really mean SSE. MMX is a very old technology. In my experience, MS Visual Studio does vectorization of integer types with SSE, and it's usable (maybe newer MSVS versions use AVX in addition to SSE). – anatolyg May 10 '17 at 08:54
  • @anatolyg Yes, sorry, I did of course mean when memory bound haha. I'm not familiar with ARM assembly but that's interesting. But as I said, on x86 we'd be talking about undetectable perf differences. Those differences would come primarily from the usual expense of manipulating a type whose width is smaller than stack alignment. – Josh Parnell May 10 '17 at 10:22
  • 1
    @anatolyg Re: vectorization; I did actually mean MMX...I was under the (incorrect) impression that it was the only extension set to support ops on packed word-sized integers; I thought SSE and beyond operated only on DW and QW with respect to int data. Just pulled out the i-set reference to check and realized I'm quite wrong. Each of the SSEs do have new W and B integer ops to augment the original MMX. I really should have read this section more carefully but in my defense the sheer number of new vector instructions is enough to make any man's brain hurt. My mistake! Comment retracted. – Josh Parnell May 10 '17 at 10:34
  • many unsigned instructions are faster than signed one as per [this table](http://www.agner.org/optimize/instruction_tables.pdf). And many `short` instructions are slower than their `int` versions. A few may be faster (like mul/div). Even if they have the same performance you may still be faster using int in higher level code, because using `short` may result in some zero/sign extension and/or truncation – phuclv Jun 09 '17 at 01:01
18

This will depend on exact implementation. In most cases there will be no difference however. If you really care you have to try all the variants you consider and measure performance.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 24
    `+1` for "if you want to know, you need to measure". It's very annoying that this needs the be answered almost weekly. – sbi Jan 17 '11 at 10:55
  • @sbi If you're trying to write portable code, measurement will not answer your question. – martinkunev Sep 02 '21 at 12:14
  • @martinkunev If you need portable code, then you need to measure on all platforms you want to port to – but the result is very likely that one platform's optimization is another platform's pessimization. IME it mostly boils down to this: either you tailor optimizations to platforms, or you don't optimize. – sbi Sep 14 '21 at 10:20
9

The performance difference between signed and unsigned integers is actually more general than the acceptance answer suggests. Division of an unsigned integer by any constant can be made faster than division of a signed integer by a constant, regardless of whether the constant is a power of two. See http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html

At the end of his post, he includes the following section:

A natural question is whether the same optimization could improve signed division; unfortunately it appears that it does not, for two reasons:

The increment of the dividend must become an increase in the magnitude, i.e. increment if n > 0, decrement if n < 0. This introduces an additional expense.

The penalty for an uncooperative divisor is only about half as much in signed division, leaving a smaller window for improvements.

Thus it appears that the round-down algorithm could be made to work in signed division, but will underperform the standard round-up algorithm.

Community
  • 1
  • 1
David Stone
  • 26,872
  • 14
  • 68
  • 84
9

This is pretty much dependent on the specific processor.

On most processors, there are instructions for both signed and unsigned arithmetic, so the difference between using signed and unsigned integers comes down to which one the compiler uses.

If any of the two is faster, it's completely processor specific, and most likely the difference is miniscule, if it exists at all.

Sebastian Paaske Tørholm
  • 49,493
  • 11
  • 100
  • 118
4

Not only division by powers of 2 are faster with unsigned type, division by any other values are also faster with unsigned type. If you look at Agner Fog's Instruction tables you'll see that unsigned divisions have similar or better performance than signed versions

For example with the AMD K7

Instruction Operands Ops Latency Reciprocal throughput
DIV r8/m8 32 24 23
DIV r16/m16 47 24 23
DIV r32/m32 79 40 40
IDIV r8 41 17 17
IDIV r16 56 25 25
IDIV r32 88 41 41
IDIV m8 42 17 17
IDIV m16 57 25 25
IDIV m32 89 41 41

The same thing applies to Intel Pentium

Instruction Operands Clock cycles
DIV r8/m8 17
DIV r16/m16 25
DIV r32/m32 41
IDIV r8/m8 22
IDIV r16/m16 30
IDIV r32/m32 46

Of course those are quite ancient. Newer architectures with more transistors might close the gap but the basic things apply: generally you need more micro ops, more logic, more latency to do a signed division

phuclv
  • 37,963
  • 15
  • 156
  • 475
3

In short, don't bother before the fact. But do bother after.

If you want to have performance you have to use performance optimizations of a compiler which may work against common sense. One thing to remember is that different compilers can compile code differently and they themselves have different sorts of optimizations. If we're talking about a g++ compiler and talking about maxing out it's optimization level by using -Ofast, or at least an -O3 flag, in my experience it can compile long type into code with even better performance than any unsigned type, or even just int.

This is from my own experience and I recommend you to first write your full program and care about such things only after that, when you have your actual code on your hands and you can compile it with optimizations to try and pick the types that actually perform best. This is also a good very general suggestion about code optimization for performance, write quickly first, try compiling with optimizations, tweak things to see what works best. And you should also try using different compilers to compile your program and choosing the one that outputs the most performant machine code.

An optimized multi-threaded linear algebra calculation program can easily have a >10x performance difference finely optimized vs unoptimized. So this does matter.

Optimizer output contradicts logic in plenty of cases. For example, I had a case when a difference between a[x]+=b and a[x]=b changed program execution time almost 2x. And no, a[x]=b wasn't the faster one.

Here's for example NVidia stating that for programming their GPUs:

Note: As was already the recommended best practice, signed arithmetic should be preferred over unsigned arithmetic wherever possible for best throughput on SMM. The C language standard places more restrictions on overflow behavior for unsigned math, limiting compiler optimization opportunities.

Íhor Mé
  • 896
  • 9
  • 13
1

Traditionally int is the native integer format of the target hardware platform. Any other integer type may incur performance penalties.

EDIT:

Things are slightly different on modern systems:

  • int may in fact be 32-bit on 64-bit systems for compatibility reasons. I believe this happens on Windows systems.

  • Modern compilers may implicitly use int when performing computations for shorter types in some cases.

thkala
  • 84,049
  • 23
  • 157
  • 201
  • yes, traditionally ;-) on current 64-bit systems, `int` is still 32 bits wide, but 64 bit types (`long` or `long long`, depending on the OS) should be at least as fast. – Philipp Jan 17 '11 at 10:59
  • 2
    `int` is always 32 bits wide on all systems I know (Windows, Linux, Mac OS X, regardless whether the processor is 64-bit or not). It's the `long` type that is different: 32 bits on Windows, but one word on Linux and OS X. – Philipp Jan 17 '11 at 11:04
  • @Philipp but `int` does not have to be **always** 32 bits wide. – mercury0114 Nov 15 '20 at 11:37
  • @mercury0114: For performance tuning assumptions, it's safe to assume that `int` is a type that the CPU can operate on efficiently. e.g. 32-bit on most 64-bit platforms, because they all have efficient 32-bit integer ops, especially x86-64 where `int` *is* the native integer format, the default operand-size in machine code. (The full register width is 64-bit, but using that takes an extra byte of machine-code per instruction). And usually `int` is not too huge; it's very rarely 64-bit. But of course it actually is on some systems, IIRC some older Cray. – Peter Cordes Dec 27 '22 at 04:21
  • TL:DR: comments made it sound like `int` being 32-bit "for compatibility" had a performance downside. It doesn't. 32-bit is also the best choice for performance, especially on x86-64 (fewer advantages than on AArch64, where `uint64_t` doesn't cost extra code-size). See [The advantages of using 32bit registers/instructions in x86-64](https://stackoverflow.com/q/38303333) – Peter Cordes Dec 27 '22 at 04:23
1

IIRC, on x86 signed/unsigned shouldn't make any difference. Short/long, on the other hand, is a different story, since the amount of data that has to be moved to/from RAM is bigger for longs (other reasons may include cast operations like extending a short to long).

CAFxX
  • 28,060
  • 6
  • 41
  • 66
  • 1
    Also bear in mind that certain compilers may have optimizations that don't apply to all integer types. E.g. at least old Intel compilers could not apply autovectorization if the for-loop counter was anything else than a signed int. – CAFxX Jan 17 '11 at 10:57
  • it doesn't matter at instruction level but from C++ level it does matter – phuclv Jun 09 '17 at 00:51
  • @LưuVĩnhPhúc are you talking about signed overflow being UB? if so, the only case I'm aware of in which matters is the case in which it's harder for optimizing compilers to reason about unsigned intergers used as loop counters/induction variables (and this was covered by my comment immediately above yours) – CAFxX Jun 10 '17 at 22:25
  • No, there are various other cases where the signness matter. Did you read the other answers? – phuclv Jun 11 '17 at 01:08
  • I did. Did you? Most of them say that there are not big differences, unless for compile-time-constant divisions and for loop induction variables (that I mentioned in my comment). Even in yours you *kinda* point out that in newer processors the difference is not very big (check e.g. the Sandy Bridge tables) – CAFxX Jun 13 '17 at 05:34
1

Signed and unsigned integers will always both operate as single clock instructions and have the same read-write performance but according to Dr Andrei Alexandrescu unsigned is preferred over signed. The reason for this is you can fit twice the amount of numbers in the same number of bits because you're not wasting the sign bit and you will use fewer instructions checking for negative numbers yielding performance increases from the decreased ROM. In my experience with the Kabuki VM, which features an ultra-high-performance Script Implementation, it is rare that you actually require a signed number when working with memory. I've spend may years doing pointer arithmetic with signed and unsigned numbers and I've found no benefit to the signed when no sign bit is needed.

Where signed may be preferred is when using bit shifting to perform multiplication and division of powers of 2 because you may perform negative powers of 2 division with signed 2's complement integers. Please see some more YouTube videos from Andrei for more optimization techniques. You can also find some good info in my article about the the world's fastest Integer-to-String conversion algorithm.

0

Unsigned integer is advantageous in that you store and treat both as bitstream, I mean just a data, without sign, so multiplication, devision becomes easier (faster) with bit-shift operations

Dr. Debasish Jana
  • 6,980
  • 4
  • 30
  • 69