Which is the better approach to multiplying two bytes using only bit-shifting and adding?

Question

Initial Question:

A group of us (electronic engineering students - UK) have recently been getting to grips, in our own time, with programming the PIC16F84A microcontroller. The need has arisen to multiply two 8-bit numbers together, with no known min/max for each. A fellow student presented the following idea.

multiply_numbers:
; Takes numbers in Num1 and Num2, and returns product in OutH:OutL
    clrf    OutH            ; clear all non-input variables
    clrf    OutL
mult_loop
    bcf     STATUS,c        ; clear carry bit
    movfw   Num2
    addwf   OutL            ; add Num2 to OutL
    btfsc   STATUS,c        ; check carry bit
    incf    OutH            ; if set, increment OutH
    decfsz  Num1            ; decrement Num1
    goto    mult_loop       ; if Num1 is not zero, repeat loop
    return                  ; else return

I felt that this, although quite short in terms of lines of code, could take a relatively long time to execute for larger numbers. I did a bit of thinking myself and started along a route of shifting one number to the right, the other to the left, and adding the left-shifted number a certain amount of times, along the way, to the output to arrive at the final answer. I wasn't quite doing it right, but then stumbled across this question on SO which gave me the idea of expressing one of the input numbers as:

N = a_0 + a_1*2 + a_2*2^2 + a_3*2^3 + ... + a_7*2^7

From that starting point, I came up with this method for multiplying two 8-bit numbers to get a 16-bit output (stored in two 8-bit registers).

multiply_numbers:
; Takes numbers in Num1 and Num2L, and returns product in OutH:OutL
    clrf    Num2H           ; clear all non-input variables
    clrf    OutL
    clrf    OutH
mult_loop
    btfsc   Num1,0          ; test LSB of Num1
    call    add_num16       ; if set, add Num2H:Num2L to OutH:OutL
    call    shift_left      ; shift Num2H:Num2L left (multiply by 2)
    rrf     Num1,f          ; shift Num1 right
    clrw                    ; clear working register (0x00)
    bcf     STATUS,z        ; clear zero bit (3) of the STATUS register
    addwf   Num1,w          ; add 0x00 to Num1
    btfss   STATUS,z        ; if Num1 is zero, then exit loop
    goto    mult_loop       ; else, continue with another iteration
    return
add_num16
    movfw   Num2H
    addwf   OutH,f          ; add Num2H to OutH
    bcf     STATUS,c        ; clear carry bit (0) of the STATUS register
    movfw   Num2L
    addwf   OutL,f          ; add Num2L to OutL
    btfsc   STATUS,c        ; check carry bit
    incf    OutH,f          ; increment OutH if set (OutL overflowed)
    return
shift_left
    bcf     STATUS,c        ; clear carry bit
    rlf     Num2L,f         ; rotate Num2L left (carry -> LSB, MSB -> carry)
    rlf     Num2H,f         ; rotate Num2H left, using carry bit from Num2L
    return

I think this second example is quicker in most cases, simply because the loop will only repeat up to 8 times instead of up to 256 times.

Am I correct in my assumption of their relative speed/efficiency? And does the second block of code actually function as I intend it to (are there any potential problems with it that I've missed)? Lastly, can this multiplication be further optimized using techniques not already employed?

Thank you in advance.

P.S. All variables/registers have been properly defined with their own address. The extensive code commenting is because we are trying to compile a set of routines that we can refer back to in the future and still know at a glance what is going on and why.

P.P.S. This question is related to personal/hobby interest in programming this pic and has nothing to do with any current coursework, etc. Just to allay any suspicions you might have had!

Microcontroller: PIC16F84A
Development environment: MPLABX IDE v1.10
Compiler: mpasm (v5.43)

Edit #1:

Instead of testing the LSB of Num1 and adding a left-shifted Num2H:Num2L to OutH:OutL, test the MSB of Num1 and add Num2 to a left-shifted OutH:OutL.
Make 'shift_left' inline rather than a called sub-function.
Unroll 'mult_loop' to optimize the 8th iteration.

Method 2 improved:

multiply_numbers:
; Takes numbers in Num1 and Num2, and returns product in OutH:OutL
    clrf    OutL            ; clear all non-input variables
    clrf    OutH
; 1st iteration
    btfsc   Num1,7          ; test MSB of Num1
    call    add_num8        ; if set, add Num2 to OutH:OutL
    bcf     STATUS,c        ; clear carry bit
    rlf     OutL,f          ; rotate OutL left (carry -> LSB, MSB -> carry)
    rlf     OutH,f          ; rotate OutH left, using carry bit from OutL
    rlf     Num1,f          ; shift Num1 left
; 2nd iteration
    btfsc   Num1,7
    call    add_num8
    bcf     STATUS,c
    rlf     OutL,f
    rlf     OutH,f
    rlf     Num1,f
; 3rd iteration
    btfsc   Num1,7
    call    add_num8
    bcf     STATUS,c
    rlf     OutL,f
    rlf     OutH,f
    rlf     Num1,f
; 4th iteration
    btfsc   Num1,7
    call    add_num8
    bcf     STATUS,c
    rlf     OutL,f
    rlf     OutH,f
    rlf     Num1,f
; 5th iteration
    btfsc   Num1,7
    call    add_num8
    bcf     STATUS,c
    rlf     OutL,f
    rlf     OutH,f
    rlf     Num1,f
; 6th iteration
    btfsc   Num1,7
    call    add_num8
    bcf     STATUS,c
    rlf     OutL,f
    rlf     OutH,f
    rlf     Num1,f
; 7th iteration
    btfsc   Num1,7
    call    add_num8
    bcf     STATUS,c
    rlf     OutL,f
    rlf     OutH,f
    rlf     Num1,f
; 8th iteration
    btfss   Num1,7          ; test MSB of Num1
    return                  ; if not set, then return. else...
add_num8
    bcf     STATUS,c        ; clear carry bit (0) of the STATUS register
    movfw   Num2
    addwf   OutL,f          ; add Num2L to OutL
    btfsc   STATUS,c        ; check carry bit
    incf    OutH,f          ; increment OutH if set (OutL overflowed)
    return

There's [far more ways to multiply numbers than most people would guess](http://en.wikipedia.org/wiki/Multiplication_algorithm) -- though for 8-bit numbers I think lookup tables would be many people's first choice if you have the memory to spare. — sarnold, May 21 '12 at 01:28

Ira Baxter · Accepted Answer · 2012-05-21T02:40:02.660

Yes, but you can likely do better. There are a bunch of classic "tricks" to do this.

First, knowing that the multiplier can be interpreted as a sum of powers of two, you cleverly just add the multiplicand to the multiplier when a multiplicand bit is nonzero.

Second, the value added is only the size of the multiplicand. While you need a 16 (partial and) final product, you don't have to do 16 bit adds; you can do 8 bit adds and propagate any carry. This is usually easy to do in assembler.

To keep the time down, you don't want to call and ADD routine in the middle of you loop. Inline the code to save the time taken by the call, the return, and optimize any register shuffling. Finally, you loop only and exactly 8 times; it it worthwhile to unroll such a loop 8 times to avoid the overhead of the counter, and lower the "register pressure" caused by having to fiddle with counter, giving you more freedom to optimize.

Notice I've said nothing about the PIC controller, and in fact I don't know its instruction set. But what I have said is relevant to anybody implementing an 8 bit multiply. (There are equivlent tricks for 16, 32, and 64 bit multiplies). So on can abstractly write the following code:

 mul16: // computes M1 * M2 --> P where M1 and M2 are 8 bit values, P is 16 bits
        // P is represent by Plow and Phigh 8 bit values.
        // reset the (partial) product
        Plow=0; Phigh=0; // all 16 bits 
       // First iteration:          
       if msb(M1)==1 then { Plow+=M2; if carry then Phigh++; /* propagate carry */ }
       shift M1 left bit;
       shift (Phigh,Plow) left one bit
       // Second iteration
            <same as first>
       <3rd ..7th iteration, same as first>
       // 8th iteration
        if msb(M1)==1 then { Plow+=M2; if carry then Phigh++ }
       // dont bother: shift M1 left bit;
       // dont bother: shift (Phigh,Plow) left one bit   
       <done>

You may cleverly notice that what is written as "if msb(M1)..." and "shift M1 left one bit" are often easily implemented with assembly language "shift left" instructions, or in desperation, adding a value to itself :-} Similarly, the "if carry ... add one" is often implemented with an "add carry" instruction.

I leave it to you to recode this for the PIC.

I was left-shifting Num2H:Num2L on each iteration and adding it to OutH;OutL when LSB(Num1)=1. As you say, it is better to left-shift OutH:OutL on each iteration and when MSB(Num1)=1, add Num2 to OutL and propagate the carry to OutH if necessary. Also, unrolling the loop allows optimization of the last iteration. Thank you for pointing these things out. — Iakovosian, May 21 '12 at 07:07

score 4 · Answer 2 · answered May 21 '12 at 02:27

4

Oh my. I haven't written multiply code in assembler for about 30 years. That harks back to the days of writing code for the Apple II in 6502 assembler.

You are absolutely right that the second approach is much faster. 8 adds and 8 shifts is much, much faster than up to 256 adds.

However, I think you have it backward.

You want to start with the MSB of num1, and if that bit is 1, add num2 to your result. After each bit but the LSB of num1, shift the result left by 1.

answered May 21 '12 at 02:27

Duncan C

128,072
22
173
272

to my embarassment as an old assembly language hand, I got the shift direction in my answer wrong and you had it right. So, +1 to to you, and I fixed my answer. – Ira Baxter May 21 '12 at 02:41

Which is the better approach to multiplying two bytes using only bit-shifting and adding?

Initial Question:

Edit #1:

2 Answers2