How many least-significant bits are the same for both an unsigned and a signed multiplication?

Question

ARM processors, for example, have a mul instruction, which performs a 32-bit x 32-bit multiplication and returns the least-significant 32-bits of the result. They also have umull and smull instructions, which again do a 32-bit x 32-bit multiplication, but return a full 64-bit result. umull does an unsigned multiplication, and smull does a signed multiplication.

Why is it not necessary to have separate unsigned and signed versions of mul? In the case of a 32-bit x 32-bit multiplication, presumably the least-significant 32-bits of the result are the same in both cases? Are exactly 32 bits the same, or more than 32?

More generally, in the case of an m-bit x n-bit multiplication (producing an (m+n)-bit result), how many least-significant bits are the same for both an unsigned and a signed multiplication?

Try working out 1 * 0xffffffff first in 32-bit two's complement, then again in 64-bit two's complement ;) — Notlikethat, Aug 25 '16 at 14:12
Related: [Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?](http://stackoverflow.com/questions/34377711/which-2s-complement-integer-operations-can-be-used-without-zeroing-high-bits-in) — Peter Cordes, Aug 25 '16 at 19:59
I think `min(m,n)` - any higher bit will be at a position where sign-extension or zero-extension could have made a difference to partial products that add into it. — Peter Cordes, May 19 '22 at 17:45

old_timer · Answer 1 · 2016-08-25T14:19:15.813

You can do this with pencil and paper...Elementary school style

-1 * 3 = -3, 7 * 3 = 21 smull vs umull

0b111 * 0b011

   111111
*  000011
==========
   111111
  111111
 000000
 ...
+
==========
 11111101

(technically the signs are extended as wide as the internal alu inputs)

which is -3

take the same 3 bit numbers but use umull

0b111 * 0b011

      000111
*     000011
=============
      000111
     000111
    000000
+  ...
==============
     0010101

the result is 21.

The beauty of twos complement is that add and subtract use the same logic, but multiply and divide you have to sign extend to get the right answer and there in lies the rub. Unsigned sign extends zeros signed sign extends the sign. Multiplies at worse double the number of bits required to store the result so a 16 bit by 16 multiply needs to be done with 32 bit operands and what you fill into those upper 16 bits varies between a signed multiply and unsigned. Once you sign extend then sure you can feed the same multiply logic because there is no difference there. I guess one could argue that is how add and subtract works too, what you feed the same adder logic varies depending add vs subtract likewise what you pull out at the end may vary (if you invert the carry bit to call it a borrow)

Now per your question, yes if you take a 3 bit by 3 bit in and only look at the 3 lower bits of the output which is almost always the wrong answer, but anyway same elementary school math

   AAAabc
   DDDdef
=========
   AAAabc
  AAAabc
 AAAabc
AAAabc
...
===========

The lower 3 bits for a 3x3 bit input are strictly determined by the original 3 bit inputs, the sign extension which is what varies between umull and smull do not come into play. It is an interesting exercise but doesnt have too much real world use, most operand combinations will overflow, not all but a high percentage.

If you repeat this exercise for M * N bits then it should be the smaller of M or N bits that are unaffected by the sign extension. That simple exercise is left to the reader.

To clarify then, `umull` and `smull` do the same multiplication, the only difference is zero- vs sign-extension? In that case, for an m-bit x n-bit multiplication, extending to (m+n)-bits would affect all but the least-significant `min(m, n)` bits of at least one operand. So only the least-significant `min(m, n)` bits of the result would be identical for both an unsigned and a signed multiplication, because the higher bits would be affected by the type of extension. — user200783, Aug 25 '16 at 14:29
Two of the variants multiply the values of two registers together and store the 64-bit result in third and fourth registers. There are signed (SMULL) and unsigned (UMULL) variants. The signed variants produce a different result in the most significant 32 bits if either or both of the source operands is negative. as written in some arm documentation. — old_timer, Aug 25 '16 at 14:50
Unfortunately the pseudo code in that reference didnt explain the difference, I guess they assume everyone knows... — old_timer, Aug 25 '16 at 14:53
Do some experiments to see what you see. I expect the lower 32 bits of the result will be the same. — old_timer, Aug 25 '16 at 14:54
@dwelch the difference in the pseudocode (which is defined in terms of arbitrary-precision integer arithmetic) is `operand1 = SInt(R[n]);` vs. `operand1 = UInt(R[n]);` (and equivalent for the rest). i.e. effectively whether the 32-bit input operands are sign-extended or zero-extended to 64 bits _before_ feeding them into the multiplier. — Notlikethat, Aug 25 '16 at 15:15
Depends on which manual the first one I looked at only had this RdHi = (Rm * Rs)[63:32] /* Signed multiplication */ the second one had this result = SInt(R[n]) * SInt(R[m]); R[dHi] = result<63:32>; and that is all I looked at. So you found a better one. — old_timer, Aug 25 '16 at 17:31

Peter Cordes · Answer 2 · 2022-05-21T14:17:13.103

The low half of a multiply doesn't care about signedness, only the high half above the input widths.

In the general case with different input widths, min(m,n) bits will be guaranteed the same for all inputs whether you treated inputs as signed or unsigned.

Yes, non-widening mul is directly usable for i32 = i32 * i32 or u32 = u32 * u32. You can see compilers use it that way on the Godbolt compiler explorer, even with clang -fwrapv to make signed overflow well-defined behaviour in C (wrapping around).

Non-widening multiply is a binary operation that, like add/sub, is the same for unsigned or 2's complement integers. It's equivalent to conditionally adding left-shifted copies of one input (partial products), according to bits of the other.

Addition propagates carry from low to high only, and left shift only shifts in zeros, and multiplication is built from those. And the "condition" for whether to add a shifted copy or not doesn't depend on whether we interpret the MSB as -2^{n-1} or +2^{n-1}. (Related: Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?)

Widening multiply is equivalent to sign-extending or zero-extending the inputs to the desired output width, and then doing a non-widening multiply. CPUs internally do it more efficiently, but the final integer result must be bitwise identical. 2's complement sign-extension duplicates the sign-bit to higher positions, zero-extension makes them zero.

A signed x unsigned multiply can be done by sign-extending one input, zero-extending the other, then doing a non-widening multiply. (And software may have to do this in practice without fun RISC-V instructions like mulhsu: high-half of signed x unsigned multiply.)

Any bit-position above min(m,n) will be at a position where sign-extension or zero-extension of an input could have put a 0 vs. a 1 at this position. Thus affecting the output at this bit-position. (At least for some possible inputs, like where one has its highest bit set so is negative if treated as signed.)

For smull and umull, min(32,32) = 32, so the low half is the same between those instructions, differing only in high half.

See also Why is imul used for multiplying unsigned numbers? for another take on the math, including a proof that of the statement: When multiplying two numbers a and b of length n, the result is of length 2 n and, most importantly, the k-th digit only depends on the lowest k digits (of the inputs).

And discussion of x86's later introduction of a non-widening form of imul reg, reg/mem, as opposed to one-operand mul/imul where the other source, and the destination, are implicit. The reason for this x86 design decision is the same as with ARM and RISC-V and others only providing one non-widening multiply instruction, instead of signed and unsigned versions.

How many least-significant bits are the same for both an unsigned and a signed multiplication?

2 Answers2

Linked

Related