Does unsigned math require more CPU instructions?

Question

Take an C++ integral variable i, and suppose that you're multiplying its value by 2.

If i has signedness, I believe that the operation is somewhat equivalent, at least mathematically, to:

i = i << 1;

But if i's type is unsigned, then since unsigned values do not overflow but are performed modulo their range, presumably the operation is something like this:

i = (i << 1) & (decltype(i))-1;

Now, I figure that the actual machine instructions will probably be more concise than a sequence of shifts for multiplication. But would a modern, say x86, CPU have a specific instruction for unsigned/modulo math? Or will performing math with unsigned values tend to cost an additional instruction, when compared to math with signed values?

(Yes, it would be ridiculous to care about this whilst programming; I'm interested out of pure curiosity.)

Since `(decltype(i))-1` (better written as `static_cast(-1)` or `~static_cast(0)`) is all 1's, using it with `&` is a no-op and can be removed by the optimizer. — Mike DeSimone, Jul 10 '11 at 12:29
Why would left-shift cause underflow? Also, what's the meaning of the sentence, "unsigned values [...] are performed modulo their range"? — Kerrek SB, Jul 10 '11 at 12:36
@Mike: not necessarily. Consider if `i` has type `unsigned short` (16 bits), but `int` is 32 bits. Then `i << 1` is a 32 bit value, whereas `(decltype(i))-1` is `0xFFFF` even after being promoted to `int` for the `&`. In that case, Tomalak is wrong to say that the standard requires a mask, but the mask would have an effect if you did it. *In combination with assigning it back to i*, though, it does have no effect, since it only affects bits that will be discarded by the conversion. — Steve Jessop, Jul 10 '11 at 12:38
@Steve: Wait a minute. You're saying the type of the result of `<<` depends on the type of the *right* hand side parameter, and is not simply that of the *left* hand side parameter? Please link to the relevant part of the standard, as this is behavior that differs from pretty much every architecture out there. — Mike DeSimone, Jul 10 '11 at 12:44
@Mike: 5.8/1, "the operands shall be of integral or enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand". Integral promotions are defined in 4.5, `unsigned short` is promoted to `int` if `int` can represent every value of `unsigned short` (which in my example it can), or `unsigned int` otherwise. The type of the RHS has nothing to do with it. — Steve Jessop, Jul 10 '11 at 12:48
Sorry, this wasn't clear. The question is effectively about the _implementation_ of C++'s unsigned integer wrap-around. — Lightness Races in Orbit, Jul 10 '11 at 13:00
@Tomalak: `>>1` and `/2` are always equivalent for unsigned types (5.8/3), both are a logical right shift. So any optimizing implementation should use the same instructions for both, no extra masks there either. And my answer is about implementation of C++'s unsigned integer wraparound - perhaps I should have stated explicitly that most/all CPUs, certainly x86, provide instructions for wraparound arithmetic, at least for their register size. So the only types you have to think about are non-register-size types. — Steve Jessop, Jul 10 '11 at 13:04
@Steve: I wasn't suggesting that it was actually a mask that was performed, but was asking what CPU mechanism performs the _equivalent_ of that extra step (the extra step being "implementing wrap-around"), and whether this mechanism leads to unsigned math being slower. — Lightness Races in Orbit, Jul 10 '11 at 13:06
@Tomalak: I don't really understand why you consider it an "extra step". After integral promotions, the result of a multiplication has the same number of bits as the input, whether the input is signed or unsigned. Since the mathematical value of the multiplication might require more bits, the hardware/implementation has to get rid of a bit somehow in both cases. So why is one of them an "extra step" and the other one isn't? — Steve Jessop, Jul 10 '11 at 13:10
@Steve: The implementation for a signed input doesn't _have_ to get rid of the bit, surely? That's your over-/under-flow? — Lightness Races in Orbit, Jul 10 '11 at 13:13
Sorry, I don't understand what you're talking about. The result produced by the CPU has one less bit than the true mathematical result of the multiplication. That's what I mean by "gets rid of a bit". On CPUs with an overflow bit then yes, that bit might appear there, but it can appear there for unsigned ops too if the hardware wants it to, since C++ implementations can more or less ignore the overflow bit for register-sized arithmetic. — Steve Jessop, Jul 10 '11 at 13:15
Anyway, I suggest you look at some disassembly, since there's no question of misunderstanding between you and `objdump`. — Steve Jessop, Jul 10 '11 at 13:19
@Steve: Who needs `objdump` when you've got Stack Overflow? :) — Lightness Races in Orbit, Jul 10 '11 at 13:20
@Steve: Thanks. I get to add this to my list of "things C does that no assembly architecture does," which already includes things like "C does 32 x 32 = 32 bit multiply, while a hardware multiply is 32 x 32 = 64 bits." @Tomalak: `objdump` doesn't downvote you for running it the same way twice. — Mike DeSimone, Jul 11 '11 at 12:28

score 13 · Answer 1 · answered Jul 10 '11 at 12:33

As the others already wrote: It does not matter for the CPU. Signed and unsigned instructions take the same time, Some operations in unsigned arithmetic are even easier to do and may require a cycle less than the signed variant (multi precision division is one example).

However, this is only half of the story.

C++ defines signed integer overflows as undefined behavior and unsigned integers as modulo2. This offers completely different optimization opportunities which lead to different code.

One example:

int foo (int a)
{
  return a * 1000 / 500;
}

unsigned bar (unsigned a)
{
  return a * 1000 / 500;
}

Here foo can be optimied to:

int foo (int a)
{
  return a * 2;
}

And bar will stay the same it is.

Note that mathematically these two functions are the same, but they start to give different results if the argument exceeds INT_MAX/1000.

Since the effect of signed overflows is undefined the compiler has the option to just pretend there is no INT_MAX when it comes to simplify expressions. For unsigned arithmetic this is not the case and the compiler has to emit code that does the multiplication and division. This is of course slower than the optimized variant.

Note: Most compilers are conservative when it comes to such optimizations and only enable them if you ask for them because they tend to break code and overflow checks. Other compilers, especially in the embedded and DSP world otoh always do these kind of optimizations even at low optimization levels. The programmers who write for these kind of machines are aware of the subtle details, so this is rarely a problem.

OTOH we've discussed stories where C/C++ programmers fall into this trap more than once on stackoverflow.

"For unsigned arithmetic this is not the case and the compiler has to emit code that does the multiplication and division. This is of course slower than the optimized variant." That's what I wanted to know, thanks. — Lightness Races in Orbit, Jul 10 '11 at 13:00

interjay · Accepted Answer · 2011-07-10T12:21:15.183

No, it doesn't take more instructions, at least on x86.

For some operations (e.g. addition and subtraction), the same instruction is used for both signed and unsigned types. This is possible since they work the same when using 2's complement representation of signed values.

There is also no difference for left-shift: The leftmost bit is simply discarded by the hardware, and there is no need to perform a bitwise-and like in your example.

For other operations (e.g. right-shift), there are separate instructions: SHR (shift right) for unsigned values and SAR (shift arithmetic right) for signed values, which preserves the sign bit.

There are also separate instructions for signed/unsigned multiplication and division: MUL/IMUL and DIV/IDIV, where IMUL and IDIV are used for signed values.

it's generally not correct on x86 [Why A / is faster when A is unsigned vs signed](https://stackoverflow.com/q/49477554/995714) — phuclv, Apr 24 '18 at 01:49

Steve Jessop · Answer 3 · 2011-07-10T13:07:27.113

Assuming wraparound overflow, which is what most (all?) CPU arithmetic instructions on two's complement hardware do anyway, << for unsigned types is equivalent to the multiplication. So the only problem you have is when you're doing arithmetic with a type that's smaller than the register used to contain it.

The rules about promotion to at least int (or unsigned int) in arithmetic expressions are pretty much designed to avoid this happening: when you multiply, say, an unsigned short by 2 the result is an int (or unsigned int if short and int are the same size). Whichever it is, there's no need to take any modulus when the register size matches the type. And with two's complement, you don't need different instructions for signed and unsigned C++ multiplication anyway: wraparound overflow covers both unless the hardware offers an overflow bit and you care about its value (-1 * 2 would be an unsigned overflow, but not a signed overflow, even though the resulting bit-pattern is the same).

The only time a mask might be required is if/when you convert the result back to unsigned short. Even then, I think that with care an implementation can sometimes leave additional "irrelevant" bits in the top of an int-sized register used to hold an intermediate unsigned short value. You know that those extra top bits don't affect the modulo results of addition, multiplication or subtraction, and that they would be masked off if the value is stored to memory (assuming an instruction that stores the bottom 2 bytes of an int-sized register to 2 bytes of memory, the modulus basically is free). So the implementation just has to be careful to mask before division, right-shift, comparison, and whatever else I've forgotten, or else use appropriate instructions if available.

score 2 · Answer 4 · answered Jul 10 '11 at 12:44

I think you have it the wrong way round: On unsigned data types, bit shifting does exactly what it says on the tin, and unoccupied bits are filled with zeros. This causes the correct modular arithmetic operations on the types's values for left-shift, which is multiplication. Right-shift has no arithmetic analogue because Z/nZ is not a division ring in general and there is no notion of division; right-shift is just truncating division.

On the other hand, signed types suffer from an ambiguity, because there are different ways of interpreting a bit pattern as a signed integer. With left-shift on 2's-complement you will get the expected "wrap-around" for multiplication, but there's no canonical choice of right-shift behaviour. In the old C standard I believe that was fully implementation defined, but I think C99 made that behaviour specific.

Yea, I got my bits backwards. – Lightness Races in Orbit Jul 10 '11 at 13:04 — Lightness Races in Orbit, Jul 10 '11 at 13:04

score 1 · Answer 5 · answered Jul 10 '11 at 12:10

1

As far as I know, then most CPUs possess unsigned operations in hardware, and I'm quite certain that x86 does.

answered Jul 10 '11 at 12:10

Puppy

144,682
38
256
465

score 1 · Answer 6 · answered Jul 10 '11 at 12:11

1

Unsigned math does overflow and hence is implicitly modulo their respective range.

answered Jul 10 '11 at 12:11

edgar.holleis

4,803
2
23
27

They do _not_ overflow, because they are modulo their range. And I'm asking whether the implementation of this fact leads to more CPU instructions, in general. – Lightness Races in Orbit Jul 10 '11 at 13:03

score 1 · Answer 7 · answered Jul 10 '11 at 12:37

interjay's answer covers the basics. A couple more details:

Now, I figure that the actual machine instructions will probably be more concise than a sequence of shifts for multiplication.

That depends on the processor. In the old days, when transistors were expensive, processors like the 6500 and 6800 had instructions that only shifted one bit left or right at a time.

Later, when chips got bigger and had more bits in an opcode for parameters, "barrel shifters" were implemented which could shift an arbitrary number of bits in a single cycle. This is what modern CPUs use.

Or will performing math with unsigned values tend to cost an additional instruction, when compared to math with signed values?

Never. When an operation would differ between unsigned and signed, there will be separate instructions for each.

"When an operation would differ between unsigned and signed, there will be separate instructions for each." This somewhat contrasts other answers, I think? — Lightness Races in Orbit, Jul 10 '11 at 13:02
Really? It's pretty much what interjay said in his last two paragraphs. (e.g. LSR/ASR on 6800) The fun part is that, when building up operations from smaller ones, such as doing 64-bit math on a processor that only has a 32-bit ALU, you use a mix of signed and unsigned instructions to implement a signed operation. — Mike DeSimone, Jul 11 '11 at 12:24

Does unsigned math require more CPU instructions?

7 Answers7