implement 64-bit arithmetic on a 32-bit machine

Question

The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to be equivalent to long long.

typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y) {
    *dest = x*y;
}

gcc generates the following assembly code implementing the computation: dest at %ebp+8, x at %ebp+12, y at %ebp+16

1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)

This code uses three multiplications to implement the multiprecision arithmetic required to implement 64-bit arithmetic on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm.

I don't understand line 8 and line 9 in assembly code above. Can anyone help?

Seems like it's doing something very tricky to not need add-with-carry instructions... — Mysticial, Jul 27 '12 at 02:47
@Mysticial Nothing tricky. You don't need ADC when computing truncated products as is the case here. — Alexey Frunze, Jul 27 '12 at 03:57
Standard SO response: what have you tried? From your comment below, what you're really asking is: what is the algorithm. Well, that's your homework; that's what you're supposed to figure out. What have you done so far in your attempts to figure it out? You say you know what the instructions do ... then perhaps start by writing that down. — Jim Balter, Jul 27 '12 at 04:00

score 6 · Accepted Answer · edited Jul 29 '12 at 22:31

I've converted it to intel syntax.

mov esi, y_low
mov eax, x
mov edx, eax
sar edx, 31
mov ecx, y_high

imul ecx, eax ; ecx = y_high *{signed} x

mov ebx, edx

imul ebx, esi ; ebx = sign_extension(x) *{signed} y_low

add ecx, ebx ; ecx = y_high *{signed} x_low + x_high *{signed} y_low

mul esi ; edx:eax = x_low *{unsigned} y_low

lea edx, [ecx + edx] ; edx = high(x_low *{unsigned} y_low + y_high *{signed} x_low + x_high *{signed} y_low)

mov ecx, dest
mov [ecx], eax
mov [ecx + 4], edx

What the above code does is multiplication of 2 64-bit signed integers that keeps the least-significant 64 bits of the product.

Where does the other 64-bit multiplicand come from? It's x sign-extended from 32 bits to 64. The sar instruction is used to replicate x's sign bit into all bits of edx. I call this value consisting only of the x's sign x_high. x_low is the value of x actually passed into the routine.

y_low and y_high are the least and most significant parts of y, just like x's x_low and x_high are.

From here it's pretty easy:

product = y *{signed} x =
(y_high * 2³² + y_low) *{signed} (x_high * 2³² + x_low) =
y_high *{signed} x_high * 2⁶⁴ +
y_high *{signed} x_low * 2³² +
y_low *{signed} x_high * 2³² +
y_low *{signed} x_low

y_high *{signed} x_high * 2⁶⁴ isn't calculated because it doesn't contribute to the least significant 64 bits of the product. We'd calculate it if we were interested in the full 128-bit product (full 96-bit product for the picky).

y_low *{signed} x_low is calculated using unsigned multiplication. It's legal to do so because 2's complement signed multiplication gives the same least significant bits as unsigned multiplication. Example:
-1 *{signed} -1 = 1
0xFFFFFFFFFFFFFFFF *{unsigned} 0xFFFFFFFFFFFFFFFF = 0xFFFFFFFFFFFFFFFE0000000000000001 (64 least significant bits are equivalent to 1)

Excellent answer, but probably too detailed for a `homework` tag - speaking from a TA perspective here. — atomicinf, Jul 27 '12 at 04:51
thank you very much, I see what happened. The algorithm I thought was x * (y_high * 2^32 + y_low)(x is 32 bit ), that's why I was confused about line 8 and line 9. I wonder is it necessary to extend x to 64 bit. — morsel.wang, Jul 27 '12 at 04:59
`cdq` is equivalent to `mov edx, eax` / `sar edx,31` - it sign-extends EAX into EDX:EAX. `lea edx, [ecx + edx]` might as well be `add edx, ecx`. — Peter Cordes, Dec 27 '20 at 05:45

atomicinf · Answer 2 · 2012-07-27T03:26:51.040

2

Consider the context of line 8 and 9.

By this time, ESI contains the lower half of y and EBX contains sgn(x). So line 8 is just computing sgn(x) * (y % 2^32) and storing it in EBX.

Line 9 draws upon that result. By the time Line 9 happens, ECX contains a partial upper half of the multiplication, that is, x * (y >> 32) signed. So EBX+ECX ends up being what we computed in the last step plus the partial upper half we found on a previous line.

The full algorithm itself is pretty neat ;)

EDIT: In response to a comment below...

Line 4: Consider what SAR EDX, 31 (or if you like, sar $31, %edx) really means. Since EDX is a 32-bit register, you'll end up with one of two values. Which two? Consider what they mean in the context of signed arithmetic.

Line 7: EDX by this point contains something pretty useful for the following operations. I'm just moving it where it needs to go.

edited Jul 27 '12 at 03:26

answered Jul 27 '12 at 03:12

atomicinf

3,596
19
17

thank you, but I still didn't get it. why should sgn(x) * (y % 2^32) be added to ECX? I think imull %eax %ecx has done the upper half of the multiplication. – morsel.wang Jul 27 '12 at 03:28
Good question. You're right that `IMUL %eax, %ecx` has taken care of the upper half of the multiplication, but `sgn(x) * (y % 2^32)` doesn't touch the upper half at all. Consider the mathematical bounds on `sgn(x)` and `foo % 2^32`. You might also consider that two-operation IMUL truncates its result to 32 bits, though I don't think that's _directly_ relevant for the step in question. – atomicinf Jul 27 '12 at 03:32
sgn(x) * (y % 2^32) doesn't touch the upper half at all, but its value is added to ecx where the upper half of multiplication is stored. – morsel.wang Jul 27 '12 at 03:41
This is absolutely true; we need the combined value later in the algorithm, so at this moment, we use `ECX` as an intermediate value essentially because it's there and we don't need the raw upper half anymore. – atomicinf Jul 27 '12 at 03:44
@atomicinf I believe you mean "two-operand" there instead of "two-operation" – harold Jul 27 '12 at 09:04
@harold That is in fact what I meant. Thanks for the correction :) – atomicinf Jul 27 '12 at 13:01

vishal129 · Answer 3 · 2012-07-27T03:03:00.123

0

What imul does is multiplies the contents of eax with ecx and saves the lower 32 bits in eax and higher 32 bits in edx.

addl as far as I remember adds the two registers and saves it on the first one so in this case ebx. (I am not sure if it does anything else and the l after addl stands for long)

edited Jul 27 '12 at 03:03

answered Jul 27 '12 at 02:57

vishal129

105
9

2

The right operand is the destination, so line 9 puts %ebx + %ecx in %ecx. – Jim Balter Jul 27 '12 at 03:08
this [link](http://www.cs.umd.edu/users/meesh/webpages/cmsc311/links/handouts/ia32.pdf) gives in depth knowledge and considering it is an homework. – vishal129 Jul 27 '12 at 03:08
1

That document doesn't match gnu's assembly language ... see http://stackoverflow.com/questions/2397528/mov-src-dest-or-mov-dest-src – Jim Balter Jul 27 '12 at 03:10
thank you for your answer. this code is form the book cs_app second edition(homework 3.55). I know what those assembly instructions mean, but I am confused by the algorithm. What are line 4,7,8,9 for? – morsel.wang Jul 27 '12 at 03:14
Two-operand IMUL signed-multiplies the two operands and stores the result in the destination operand. You've just described one-operand `IMUL ECX`. – atomicinf Jul 27 '12 at 03:15
@user1556375 see my updated answer, but since this is homework, you'll need to connect the dots yourself. – atomicinf Jul 27 '12 at 03:27
@morsel.wang If you wanted to know what the algorithm is, you should have just said that. But that's your homework and what you're supposed to be figuring out. What have you done to try to figure it out *yourself*? – Jim Balter Jul 27 '12 at 03:58

implement 64-bit arithmetic on a 32-bit machine

3 Answers3

Linked