Why do some arithmetic instructions have a signed/unsigned variant and some don't

Question

Assume we have:

a = 0b11111001;
b = 0b11110011;

If we do Addition and Multiplication on paper with hand we get this result, we don't care if its signed or not:

a + b = 111101100
a * b = 1110110001011011

I know that Multiplication doubles the width and addition could overflow:

Why is imul used for multiplying unsigned numbers?

Why do some CPUs have different instructions to do signed and unsigned operations?

My question is, why instructions like Add don't usually have a signed/unsigned version, but Multiply and Divide do?

Why can't we have a generic unsigned multiply, do the math like I did above and truncate the result if its singed, same way Add does.

Or the other, why can't Add have a signed/unsigned version. I have checked a few architectures and this seems to be the case.

Because for 2's complement machines, signed add/sub/left-shift are the same binary operations as unsigned. Semi-related: [Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?](https://stackoverflow.com/q/34377711) — Peter Cordes, Jun 18 '21 at 20:06
You already linked [Why do some CPUs have different instructions to do signed and unsigned operations?](https://stackoverflow.com/q/60255445) which explains that widening multiply is different for signed vs. unsigned, unlike add. — Peter Cordes, Jun 18 '21 at 20:08
like I said it doesn't seem to be different, my example above. the `signed` version just truncates the result, nothing more, i.e `a*b` interpreted as signed would be `1011011`, which is the right side of my `unsigned` result above. — Dan, Jun 18 '21 at 20:09
(sorry, didn't see your comment before posting my last). Widening signed-multiply doesn't involve any truncation. The *low half* of signed and unsigned multiply is the same, that's why x86 for example only has immediate and 2-operand forms of `imul`, not also `mul`. https://www.felixcloutier.com/x86/imul / https://www.felixcloutier.com/x86/mul. Only separate widening forms. — Peter Cordes, Jun 18 '21 at 20:11
Your calculation of signed `a * b` is incorrect. The signed product is `0b0000000001011011`. This is different from the unsigned product. — Raymond Chen, Jun 18 '21 at 20:14
@RaymondChen I removed the `char` type. Assume these are bits written on paper, signed or unsigned doesn't matter. — Dan, Jun 18 '21 at 20:16
Which of your "bits written on paper" is the sign bit? if there is no sign bit, then your "bits written on paper" are **unsigned**. It does matter — Raymond Chen, Jun 18 '21 at 20:17
@RaymondChen I don't understand. Can't you just multiply the bits together and interpret the result as signed or unsigned? like this YouTube video explains: https://www.youtube.com/watch?v=-Ypf2s9_gG0 , same way you do for `add`, you just add the bits, doesn't matter if it's signed or not? — Dan, Jun 18 '21 at 20:20
The meaning of the sign bit is special. It doesn't represent itself. It represents an infinite number of copies of itself. So when you multiply bit 7 against bit 0, are you going to take just that one bit of the result? Or are you going to replicate that one bit of result into all the higher bits? That's what makes the multiplication signed or unsigned. Unlike addition (where those extra bits get truncated away so it doesn't matter), with multiplication those extra bits are part of the double-precision result so they aren't just truncated away. — Raymond Chen, Jun 18 '21 at 21:08
@RaymondChen so essentially, if it's signed you extend it double the size. else you zero extend it double the size, and then multiply them? — Dan, Jun 18 '21 at 21:14
@Dan: yes, I just [replied](https://stackoverflow.com/questions/68040883/why-do-some-arithmetic-instructions-have-a-signed-unsigned-variant-and-some-don/68041066?noredirect=1#comment120261244_68041066) to answer the same thing in response to your comment on my answer. You can look at `imul` vs. `mul` as doing that internally, that's why (like movzx / movsx) you want different multiply instructions. — Peter Cordes, Jun 18 '21 at 21:16
Thanks, I tried to make a conclusive comment under your post. — Dan, Jun 18 '21 at 22:15

score 1 · Accepted Answer · answered Jun 18 '21 at 20:23

1

I think your choice of example misled you into thinking the signed product could be obtained by truncating the 8x8 => 16-bit unsigned product down to 8 bits. That is not the case.

(249-256) * (243-256) = 0x005b, a small positive result that happens to fit in the low half of the full result. But the full signed result is not always the operand-size truncation of the unsigned product.

For example, -128 * 127 is -16256, or as 16-bit 2's complement, 0xc080.

But 0x80 * 0x7f is + 16256, i.e. 0x3f80. Same low half, different upper half.

Or for another example, see Why are signed and unsigned multiplication different instructions on x86(-64)?

Widening signed-multiply doesn't involve any truncation. The low half of signed and unsigned multiply is the same, that's why x86 for example only has immediate and 2-operand forms of imul, not also mul. Only widening multiply needs a separate form. (Or if you want FLAGS set according to unsigned overflow of the low half, instead of signed overflow. - so you can't easily use non-widening imul if you want to detect when the full unsigned result didn't fit.)

answered Jun 18 '21 at 20:23

Peter Cordes

328,167
45
605
847

Ok here is the question, can't I just sign extend the inputs and do an `unsigned multiplication` on them? – Dan Jun 18 '21 at 20:56
@Dan: In theory yes, but that would require a lot more multiply hardware to handle the general case of (on x86-64 for example) a non-widening 128x128 => 128-bit multiply (which doesn't care about signedness), instead of a widening 64x64 => 128-bit signed multiply. Even if you require the high halves of the inputs to be either all-0 or all-1, you'd still have to invent an instruction with 4 input registers, and a bunch of inconvenient work for software to do to set up those inputs. The kind of repetitive work you might want a special instruction for... perhaps an instruction like imul. :P – Peter Cordes Jun 18 '21 at 21:12
I guess it's safe to say, when doing integer addition with same length you don't need to sign/zero extend them, but you have to extend them when multiplying, even if their length is the same, so that's why we have sign/unsigned multiplication but not addition. – Dan Jun 18 '21 at 22:15
@Dan: Yes, for widening multiply only. Some (versions of) some ISAs only have non-widening multiply, e.g. ARM Cortex-M0 only has `mul` (https://godbolt.org/z/jGfGj8Pqo). As for add, the full 33-bit unsigned result of 32-bit addition is in the combination of carry flag and the integer register. But often we work with numbers that can be added without overflow / wrapping. (You don't really have the option of sign-extending into carry first, so there's no 33-bit signed result on a 32-bit CPU.) – Peter Cordes Jun 18 '21 at 22:22
that makes sense, like its explained here: https://stackoverflow.com/questions/42587607/why-is-imul-used-for-multiplying-unsigned-numbers , if adding two 64 bit numbers but only 32 bit of the result is needed then there is no need to extend. – Dan Jun 18 '21 at 22:27
Can I confirm `div` as well? i.e the reason we have signed/unsigned version is because `div` needs to take the absolute value of the inputs first if signed? https://electronics.stackexchange.com/a/467245 – Dan Jun 18 '21 at 22:27
@Dan: x86 `div` does unsigned division so the inputs are already known to be non-negative. If you mean division in general, then yes, hardware implementations of signed division (x86 `idiv` or ARM `sdiv` for example) may work by taking the absolute value. Of course the mathematical difference in interpretation of the values (the place value of the MSB being negative or not) matters regardless of how it's implemented internally. `255` has the same bit-pattern as `-1` in 8-bit 2's complement, but those two numbers are completely different as divisors, for example. (but yes, +1 is similar) – Peter Cordes Jun 19 '21 at 00:24
@Dan: Fun fact: signed division can overflow even without a double-width dividend. e.g. `INT_MIN / -1` is C undefined behaviour for that reason. But unsigned division can't overflow unless the dividend is wider than the quotient register. (Except for division by 0 of course). So you could actually see that as a consequence of the abs operation overflowing, I guess. (perhaps related: ARM doesn't have a narrowing-division instruction, and it doesn't fault on sdiv INT_MIN/-1. [Why does integer division by -1 (negative one) result in FPE?](https://stackoverflow.com/q/46378104)) – Peter Cordes Jun 19 '21 at 00:33
Thank you for all this information. So for `signed div` in hardware, `idiv` will operate on the 2s complement value of inputs (if negative), else it behaves like `div` and uses the input as is? – Dan Jun 19 '21 at 01:15
1

@Dan: Yeah, if both inputs are non-negative, you'll get the same result from idiv or div. Except that (for x86 where the dividend is double-width), idiv can #DE overflow where `div` wouldn't have, if the quotient is greater than INT_MAX, not just UINT_MAX. That's only possible with the high half being not just the sign-extension of the low half (or the special case of INT_MIN / -1) – Peter Cordes Jun 19 '21 at 01:26
Could I also ask about the floating point representation and their arithmetic ops. i'm assuming they are completely different from integer arithmetic. FP are all signed however, `division` as an example isn't done by taking absolute values right? – Dan Jul 14 '21 at 03:17
1

@Dan: FP uses a sign/magnitude representation, so it's free to get the absolute value. FP division *is* in fact just separate operations on the sign bits (result_sign = a_sign xor b_sign) and on the magnitudes. There could be some corner cases with negative 0 propagation, but a couple test cases do follow the xor rule, like `1.0 / -0.0 = -Inf` and `-0.0 / -1.0 = +0.0`. https://godbolt.org/z/KqjWrcoEc – Peter Cordes Jul 14 '21 at 03:30
I see, but operations and the ALU design should be fully different I assume? i.e FP division may not even need to get absolute values of the inputs initially and it just operates on the sign bits alone. noticeable differences should be in addition / subtraction and multiplication as well? – Dan Jul 14 '21 at 03:43
1

Until Ice Lake, the hard part of division was done on a common execution unit in Intel CPUs. (The mantissas for FP, vs. just the integers for integer.) Obviously there's some extra logic in FP division to subtract the exponents and normalize the mantissa result. [Do FP and integer division compete for the same throughput resources on x86 CPUs?](https://stackoverflow.com/q/58422171) – Peter Cordes Jul 14 '21 at 03:48
Thanks again, so same probably goes for all other arithmetic. i.e a separate FPU unit is needed as Add/Sub/Mul/Div on FP can't be done using regular integer arithmetic as their format is different, integers being in 2s complement and FP being in IEEE 754 format? – Dan Jul 14 '21 at 03:52
1

@Dan: Yes, of course other operations are separate, unlike div not sharing any transistors between integer and FP. Integer add/sub are really simple, but FP needs to renormalize the mantissa and basically needs a barrel shifter and a bit-scan. The mantissa multiplier part of an FP mul/FMA unit is the same operation as integer multiply, so for *SIMD* (not scalar), modern CPUs do share that logic, e.g. that's why SSE4.1 `pmulld` is 2 uops on Haswell and later - the FP mantissa multipliers are only 23 bits wide 32-bit SIMD element, so it has to separate things. – Peter Cordes Jul 14 '21 at 04:02
I see, for the mantissa part, I assume its looked at as an unsigned integer when being operated by the integer ALU and then FP sign bit is handled afterwards? so it's only shared for `fp mul` in modern CPUs now and not `fp div` anymore? – Dan Jul 14 '21 at 04:08
1

@Dan: Note that it's not shared at all for *scalar* integer `imul`, only for SIMD integer stuff like `pmulld` / `pmuludq` / etc. But yes, *part* of an FP multiply is unsigned integer multiplication of the mantissas, and you take the high half. See also [Why does Intel's Haswell chip allow floating point multiplication to be twice as fast as addition?](https://electronics.stackexchange.com/a/452255) and other answers on the question which for some reason decided to describe what steps are entailed by FP add and mul. (Which as my answer explains is basically irrelevant to the specific answer) – Peter Cordes Jul 14 '21 at 04:12
Hi Peter, I have asked another question which I think you might be able to help with: https://stackoverflow.com/questions/68459242/what-are-widening-integer-operations – Dan Jul 20 '21 at 17:43

Why do some arithmetic instructions have a signed/unsigned variant and some don't

1 Answers1

Linked