-1

How can i Multiply two 32bit digits in assembly or one 32bit another 16bit, anyone knows the algorithm ?

data1 dw 32bit
data2 dw 32bit    
mov ax,data2
Mul data1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
DeadlyDagger
  • 599
  • 4
  • 12
  • 28

1 Answers1

6

First, dw is used to create a 16-bit ("word") value. It won't hold a 32-bit value. You'd need to use dd to store a 32-bit "dword", or use a pair of 16-bit values.

To multiply a pair of 32-bit values the result can be 64-bit (e.g. 0xFFFFFFFF * 0xFFFFFFFF = 0xFFFFFFFE00000001). For 8086 (and not just real mode code for 80386 or later) there is a MUL instruction, but it is limited to multiplying 2 16-bit values (and getting a 32-bit result). This means that you'd want to treat each 32-bit value as a pair of 16-bit values.

If A is split into A_low (the lowest 16-bits of the first 32-bit number) and A_high (the highest 16-bits of the first 32-bit number), and B is split into B_low and B_high in the same way; then:

  A * B = A_low * B_low
          + ( A_high * B_low ) << 16
          + ( A_low * B_high ) << 16
          + ( A_high * B_high ) << 32

The code might look like this (NASM syntax):

         section .data
first:   dw 0x5678, 0x1234  ;0x12345678
second:  dw 0xDEF0, 0x9ABC  ;0x9ABCDEF0
result:  dw 0, 0, 0, 0      ;0x0000000000000000
         section .text

    mov ax,[first]          ;ax = A_low
    mul word [second]       ;dx:ax = A_low * B_low
    mov [result],ax
    mov [result+2],dx       ;Result = A_low * B_low

    mov ax,[first+2]        ;ax = A_high
    mul word [second]       ;dx:ax = A_high * B_low
    add [result+2],ax
    adc [result+4],dx       ;Result = A_low * B_low
                                     ; + (A_high * B_low) << 16

    mov ax,[first]          ;ax = A_low
    mul word [second+2]     ;dx:ax = A_low * B_high
    add [result+2],ax
    adc [result+4],dx       ;Result = A_low * B_low
                                     ; + (A_high * B_low) << 16
                                     ; + (A_low * B_high) << 16
    adc word [result+6], 0   ; carry could propagate into the top chunk

    mov ax,[first+2]        ;ax = A_high
    mul word [second+2]     ;dx:ax = A_high * B_high
    add [result+4],ax
    adc [result+6],dx       ;Result = A_low * B_low
                                     ; + (A_high * B_low) << 16
                                     ; + (A_low * B_high) << 16
                                     ; + (A_high * B_high) << 32

We don't need adc word [result+6], 0 after the second step ([first+2] * [second]) because its high half is at most 0xfffe. [result+4] is already zero at that point (because this code only works once), so the adc [result+4],dx can't wrap and produce a carry out. It can at most produce 0xffff.

(It could be done as adc dx, 0 / mov [result+4], dx to avoid depending on that part of result being already zeroed. Similarly, adc into a zeroed register could be used for the first write to [result+6], to make this code usable without first zeroing result.)


If you are actually using an 80386 or later, then it's much much simpler:

         section .data
first:   dd 0x12345678
second:  dd 0x9ABCDEF0
result:  dd 0, 0            ;0x0000000000000000
         section .text

    mov eax,[first]          ;eax = A
    mul dword [second]       ;edx:eax = A * B
    mov [result],eax
    mov [result+4],edx       ;Result = A_low * B_low
ecm
  • 2,583
  • 4
  • 21
  • 29
Brendan
  • 35,656
  • 2
  • 39
  • 66
  • I know this is old, But you missed a carry in the 8086 version. There should be a `ADC [result+6],00h` after step2 and step 3 – Ahmed Aeon Axan Sep 22 '13 at 19:07
  • This can of course be optimized to do more of the `add`/`adc` work in registers. Using memory-destination add and adc only makes sense for showing the math most clearly, not what one should actually do in real code. (But as an answer to this question, +1) – Peter Cordes Jun 08 '21 at 08:16
  • @AhmedAeonAxan: well spotted. But only after step 3: see my edit. We can prove that it's impossible for carry to propagate that far after step 2, because `0xffff ^ 2 = 0xfffe0001` so the high half is at most `0xfffe + CF` – Peter Cordes Jun 08 '21 at 08:33
  • I also posted my own optimized version of this on [32-bit extended multiplication via stack](https://stackoverflow.com/a/67922154) (and a 64x64 => 128-bit version for 32-bit mode, using SSE2 `pmuludq` for two of the products to see if that's worthwhile and helps with register pressure.) – Peter Cordes Jun 23 '21 at 23:59