3

I am learning x86-64 assembly with an online course and the course is horribly unclear in detail. I've searched online and read several SO questions but couldn't get an answer.

I tried to figure out how to calculate binary multiplication by hand, but I am stuck with imul .

Given this example of binary multiplication, 11111111 * 00000011, it can be viewed as 255 * 3 unsigned or -1 * 3 signed.

255*3

   mov al, 255
   mov bl, 3
   mul bl

this is easy and here's how I calculate by hand, just like decimal multiplication:

     11111111
x    00000011 
--------------
     11111111       
    11111111
--------------
   1011111101  

The result overflows, the upper half is 10 in ah and the lower half is 11111101 in al. My manual calculation matches the program result.

-1*3

when it comes to signed,

   mov al, -1
   mov bl, 3
   imul bl

the program result is 11111111 in ah and 11111101 in al.

How can I calculate this result by hand? I was told that sign extension is involved in imul, but I really don't know how it works here.

I am using SASM IDE and NASM Assembler.

Rick
  • 7,007
  • 2
  • 49
  • 79
  • 1
    To manually compute there are several approaches: you can sign extend both inputs to 16-bits and do 16 × 16 keeping only the low 16-bits — even though mathematically the result of 16 × 16 would naturally be 32 bits, we know here this cannot overflow into those upper 16 bits because the original inputs were only 8 bits. – Erik Eidt Jun 01 '23 at 15:33
  • Alternately, You can calculate it by doing n × n long arithmetic, keeping all the bits, then finish by expanding to 16-bits by replicating the sign bit, the top bit of the answer from the long arithmetic multiply. If your n bits is 8 there will be no need to sign extend b/c the answer will already be 16-bits wide, but if you use a n of 3 bits wide for your inputs and long multiplication, then the result will be only 6 bits wide and you'll need to replicate the top bit up to 16 bits wide to get a 16-bit answer like `imul` of 8-bit inputs gives here. – Erik Eidt Jun 01 '23 at 15:35
  • For example, in 3 bits you'll multiply 111 (-1) by 011 (3), to get 111101 (-3) in 6 bits, then replicate the top bit ten more times to get 16 bits. – Erik Eidt Jun 01 '23 at 15:40

3 Answers3

2

Technically, you took a short cut with the long multiplication.  Though the short cut is quite reasonable for obtaining the unsigned bit pattern, it glosses over something conceptually.

     11111111
x    00000011 
--------------
     11111111
    11111111
--------------
   1011111101  

Done out fully we have for unsigned:

         11111111
    x    00000011
    --------------
 0000000011111111
+000000011111111 
+00000000000000 
+0000000000000 
+000000000000 
+00000000000 
+0000000000 
+000000000 
------------------   
00000001011111101  = 255 x 3 = 765, 

The summation of those being 17 bits, though you would only get 16 from mul, and carry/unsigned-overflow cannot happen because the inputs are only 8 bits wide.

In the above, have added leading zeros though left the trailing zeros as blanks the way you have it.

Whereas in signed arithmetic:

         11111111
    x    00000011
    --------------
 1111111111111111
+111111111111111 
+00000000000000 
+0000000000000 
+000000000000 
+00000000000 
+0000000000 
-000000000         // note, subtracting because MSB has place value -2^7
------------------   
11111111111111101  = -1 x 3 = -3, the summation being 17 bits

The summation of those being 17 bits, though you would only get 16 from imul, and signed overflow cannot happen because the inputs are only 8 bits wide.

So we account for the sign of the multiplicand (first operand) by sign-extending it into 16-bit partial products.

We account for the sign of the multiplier (second operand) by subtracting instead of adding the most-significant partial product. Because that's from multiplying by 0 or -2^7, the place value of the sign bit in that input. For unsigned, all the place values are positive, but n-bit 2's complement works by giving the MSB a place value of -2^(n-1) instead of +2^(n-1). e.g. 0x80 is the bit-pattern for -128 in 8-bit 2's complement (int8_t), but +128 as unsigned (uint8_t).

Alternatively, we could sign-extend that input and do 16 partial products instead of 8, but that's more work.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
  • If the 2nd operand is negative, would you write out 16 partial products? I feel like there should be a way to write it down as a sum of just eight 16-bit partial products, since these are 8-bit inputs, but doing that naively would be like zero-extending the second input. But imul is commutative. I think it works to *subtract* the final 16-bit partial-product, since the place-value of the MSB is `-2^7` instead of `+2^n` for other bit-positions. – Peter Cordes Jun 01 '23 at 18:19
  • @PeterCordes, yes, this is illustrative of one approach; I think there are several possible. – Erik Eidt Jun 01 '23 at 18:24
  • My point is that your answer *isn't* illustrating how to treat the 2nd operand as signed, only the 1st. (Hopefully you're already working on an update, in which case I'll delete this comment when you edit. e.g. you could put a `+` in the left column for the first 7 partial products, and a `-` in the last. And mention it in the text.) – Peter Cordes Jun 01 '23 at 18:26
  • @PeterCordes, I sort of see your point, but I don't see how that materially changes anything. We can do -1 × -3 the same way and all the carries cancel out to yield +3. Feel free to edit, of course. – Erik Eidt Jun 01 '23 at 18:55
  • If you plugged in `-3` (`11111101`) into your long-hand version, the sum of partial products would compute `-1 * (uint8_t)-3` which is `-253` as an int16_t, not the actual `-1 * (int8_t)-3` = (int16_t)+3. You'd have to either sign-extend the multiplier (2nd input) to `1111111111111101`, or subtract instead of add for the place value of the 8th partial product. I'll edit. – Peter Cordes Jun 02 '23 at 00:49
2

The low n bits of an n x n => 2n product don't depend on signed vs. unsigned. But the high half does. The result is equivalent to sign-extending (imul) or zero-extending (mul) both inputs to the destination width and then doing a non-widening multiply, although of course the actual implementation would do something more efficient. See Erik's answer for what that looks like in terms of bits.

If I just wanted to manually work out what imul would produce, I'd do -1 * 3 = -3, and then work out the bit-pattern for -3 as a 2's complement 16-bit integer. Since it's imul, I know I have to interpret the 8-bit bit-patterns as signed -1 not unsigned 0xff like mul would.

I wouldn't use binary by hand unless I was checking some bithack that was using multiplies to shift around and add bit-fields (like Popcount assembly / sum indexes of set bits but even then I was actually thinking in hex or bytes.) It is of course good to understand how you could write it down that way if you wanted to, though.


In C terms, imul r/m8 is int16_t ax = (int16_t)(int8_t)al * (int16_t)(int8_t)src;. (C is weird because operators like * already promote narrow inputs to int, and int is at least as wide as int16_t.)

In asm terms, movsx ax, al (or cbw); movsx cx, byte src ; imul ax, cx, but without destroying CX. (Imagine a hidden temporary register if you want.)

You rarely actually want widening mul or imul since they run as more than 1 uop for 16-bit or wider inputs (since they have to write halves of the product to multiple registers like EDX:EAX, not just the low 16 bits of RAX). Interestingly, 8-bit imul bl runs as a single uop on modern CPUs. See https://uops.info/ and https://agner.org/optimize/

But imul bl only widens its result to 16-bit, not all the way to a convenient 32-bit, so you'd more often want to have your inputs widened to 32-bit for imul ecx, ebx or something, since 32-bit is the most efficient operand-size most of the time on x86-64. The imul r, r/m and imul r, r/m, imm forms are single-uop and fast on modern CPUs. Some older or low-power CPUs are slower for 64-bit imul.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

Honestly, I can't fully understand the other two answers. It's over complicated for me. I just need a dumb, simple and universal rule.

I'd like to just pick up what works for me.

The result is equivalent to sign-extending (imul) or zero-extending (mul) both inputs to the destination width and then doing a non-widening multiply

@Peter Cordes

To manually compute there are several approaches: you can sign extend both inputs to 16-bits and do 16 × 16 keeping only the low 16-bits

@Erik Eidt

I tried and verified and this rule works for me.

-1*3,

sign extended

  11111111 11111111
 x00000000 00000011
-------------------
  11111111 11111111
 111111111 1111111
-------------------
1011111111 11111101

keep the low 16 bits, I get the correct result 11111111(ah) 11111101(al).

Try a 4 bit example:

-2 * 3, 1110 imul 0011

sign extended

  1111 1110
 x0000 0011
-----------
  1111 1110
 11111 110
-----------
101111 1010

keep the lower 8 bits, the result is 1111 1010, -6.

Before I wasn't sure how sign extended works in imul, now I get it and it's easy to verify. Btw, if you find manual calculation tedious for some examples (e.g. 4 bit -7 * -1, 1001 x 1111, sign extended 1111 1001 x 1111 1111, many lines to add), you can use the Windows Calculator (programmer mode) and verify the result very quickly.

Rick
  • 7,007
  • 2
  • 49
  • 79