Although it wasn't explicitly mentioned in the question, you did confirm in a comment that the assignment requires that you only use addition and subtraction. Otherwise, you would definitely want to use the mul
/imul
instructions for this. They are slower (less efficient) than addition/subtraction, but they are much faster than looping and doing repeated additions! Not to mention they make the code significantly easier to read and write, and therefore less buggy. With that stipulated, let's look at the core functions you have written.
;-----------------------------------------------------------
AddNumbers PROC
; Accepts two integers in EAX and EBX then adds the two
; integers together, storing the sum in EAX.
; Receives: EAX, EBX
; Returns: EAX
;-----------------------------------------------------------
add eax,ebx ; adds the two integers together, sum is stored in EAX
ret
AddNumbers ENDP
This is actually a really good start. You are commenting your function definitions with all the information that is required about their calling convention, including what registers parameters are passed in and what registers the results are returned in. Too many beginning assembly-language programmers don't do this, and their code turns into an inscrutable mess.
The AddNumbers
procedure is correct as written, but since it is so simple (only a single instruction other than the ret
), I would have written it as a macro. If you know C, a macro is just like a preprocessor macro there: the assembler basically just copy-pastes the stuff from the macro into the place where you use it. Basic text substitution. This is really powerful because it saves the overhead of a function call (no need for a call
or a ret
), but you want to use macros sparingly because they can also bloat the code. Fewer than 3 instructions, though, and it's a clear case for a macro. But maybe you haven't learned about macros yet and aren't supposed to be using them. If not, ignore this advice. :-)
;-----------------------------------------------------------
MultiplyNumbers PROC USES ecx
; Accepts two integers in EAX and EBX. Uses the AddNumbers
; procedure to multiply the two numbers, then stores the
; product in EAX.
; Receives: EAX, EBX
; Returns: EAX
;-----------------------------------------------------------
mov ecx,eax
dec ecx
mov eax,ebx
multiply:
call AddNumbers
loop multiply
ret
MultiplyNumbers ENDP
This doesn't return the correct result if one of the inputs is 0 or 1.
If the assignment guarantees that the inputs will be positive integers, then mathematically, 0 is not a positive integer, so you don't have to worry about that case. (In real-world code, I'd want to add an assertion that verifies the inputs are non-zero, like a test that is conditionally included only in debugging/testing builds.)
But you still need to worry about an input of 1. To see why, let's think through the flow of the code with an example. Say EAX
is 1 and EBX
is 4. This should return 4, because 1 × 4 = 4. Unfortunately, it actually returns 8. The counter, ECX
, is set to 0, but because AddNumbers
is called before the counter is checked (by the LOOP
instruction), AddNumbers
always runs once. To fix this, you need to check the counter before entering the loop.
If you know C, what you have here now is basically a do { … } while (condition)
loop. What you want is basically a while (condition) { … }
loop. The condition needs to be checked at the top of the loop, not at the bottom.
The correct code would look like this:
mov ecx, eax
dec ecx ; ECX = (EAX - 1)
jz finished ; bail out early if ECX == 0
mov eax, ebx ; EAX = EBX
multiply:
call AddNumbers ; EAX += EBX
loop multiply ; decrement counter and keep looping if != 0
finished:
ret
Except that you should avoid using the very slow LOOP
instruction. Instead, replace it with the equivalent expanded form (DEC
+ JNZ
), which is significantly faster:
mov ecx, eax
dec ecx ; ECX = (EAX - 1)
jz finished ; bail out early if ECX == 0
mov eax, ebx ; EAX = EBX
multiply:
call AddNumbers ; EAX += EBX
dec ecx ; decrement counter
jnz multiply ; keep looping if counter != 0
finished:
ret
This is a somewhat counter-intuitive case where more instructions actually result in faster code. Your teacher probably taught you LOOP
because it is conceptually simple, but no real-world code ever uses it, whether written by an assembly-language programmer or generated by a compiler, because the expanded form is much faster. You can basically forget that the LOOP
instruction even exists. It's only useful when you're optimizing for size above all else, which is rarely done, especially when hand-writing assembly code.
Another bonus optimization tip is code like:
mov ecx, eax
dec ecx
can sometimes be replaced by the multipurpose LEA
instruction, which does the move and decrement in a single step:
lea ecx, [eax - 1]
For complicated reasons, it varies whether this will actually be faster in the real world, but it won't be any slower, and it's a useful trick to know. (Compilers love to use it, as do expert assembly hackers.) The only caveat is that LEA
doesn't set flags like DEC
does, which is why you can't use it here (you need the flags set in order to conditionally branch—JZ
, JNZ
, etc.).
If you'd followed my earlier advice and turned the AddNumbers
function into a macro, the assembler would generate this code:
mov ecx, eax
dec ecx ; ECX = (EAX - 1)
jz finished ; bail out early if ECX == 0
mov eax, ebx ; EAX = EBX
multiply:
add eax, ebx ; inline expansion of AddNumbers macro
dec ecx ; decrement counter
jnz multiply ; keep looping if counter != 0
finished:
ret
That's about as optimal as you're going to get without either (A) being able to use a multiply instruction, or (B) unrolling the loop (and therefore making the code a lot more complicated, for very little gain). The necessity of looping is inevitably going to make this code dog-slow. That's a good lesson that micro-optimization can only go so far. If you've chosen a bad/inefficient algorithm, then you're doomed to slow code no matter how much you try and tweak it.
;-----------------------------------------------------------
CalculatePower PROC
; Accepts two integers in EAX and EBX, then uses the
; MultiplyNumbers procedure to calculate the integer in EAX
; to the power of the integer in EBX. The result is stored
; in the EAX register.
; Receives: EAX, EBX, ECX
; Returns: EAX
;-----------------------------------------------------------
mov ecx,eax
dec ecx
mov eax,ebx
exponent:
call MultiplyNumbers
loop exponent
ret
CalculatePower ENDP
The bug here is essentially the same as the bug we uncovered in MultiplyNumbers
, except compounded by the fact that CalculatePower
calls MultiplyNumbers
. Fortunately, the fix is also the same.
Also, you're basing your counter off of the wrong input. If we're calculating EAX
EBX
, then we want to base the counter off of EBX
.
mov ecx, ebx
dec ecx ; ECX = (EBX - 1)
jz finished ; bail out early if ECX == 0
mov ebx, eax ; EBX = EAX
exponent:
call MultiplyNumbers ; EAX *= EBX
dec ecx ; decrement counter
jnz exponent ; keep looping if counter != 0
finished:
ret
To see that this is correct, think through some examples—or better yet, single-step through some in a debugger!
- If we are calculating 51, then
ECX
will be set to 1−1 (== 0), the loop will be skipped, and the result (EAX
) will be 5.
- If we are calculating 52, then
ECX
will be set to 2−1 (== 1), the loop will iterate one time, and the result (EAX
) will be 5×5 (== 25).
- If we are calculating 15, then
ECX
will be set to 5−1 (== 4), the loop will iterate four times, and the result (EAX
) will be 1×1×1×1×1 (== 1).
Again, though this code will be very slow, there is no real way to optimize it further. Since it's logic is so complicated (so many instructions), you wouldn't even want to turn MultiplyNumbers
into a macro. The overhead of its internal loop vastly outweighs the overhead of a function call, so there wouldn't be any performance win to expanding the code inline, even if you were willing to trade code bloat for speed.
So again, we see that a good algorithm is the real key to writing efficient code. Your code implements an iterated-multiplication algorithm to do exponentiation. Computing ab this way requires b − 1 multiplications. But there are algorithms that can do it much more efficiently. The standard technique is called exponentiation by squaring. For example, if you wanted to compute a15, you could do:
a^3 = (a * a * a)
a^7 = (a^3) * (a^3) * a
a^15 = (a^7) * (a^7) * a
This requires a total of 6 multiplications, as compared to the 14 multiplications that would be required for iterated multiplication—an improvement of more than two-fold!
Just for fun, assuming that you could use multiplication instructions, you could apply this algorithm to write CalculatePower
as follows:
; Calculates EAX^EBX via exponentiation by squaring.
; Returns result in EAX.
; Clobbers: EBX, ECX
mov ecx, 1 ; ECX holds result; initialize to 1
test ebx, ebx ; is the exponent zero?
jz finished ; if so, skip everything and return result (1)
check_even:
test ebx, 1 ; test lowest bit (sets flags according to result of EBX % 2)
jz skip ; skip next instruction if exponent is even (EBX % 2 == 0)
imul ecx, eax ; result *= base
skip:
imul eax, eax ; base *= base
shr ebx, 1 ; exponent /= 2
jnz check_even ; keep looping if exponent != 0
finished:
mov eax, ecx ; put result in EAX
ret
But it turns out that this is not even the most efficient way. Going back to our original example of a15, this can actually be done in only 5 multiplications!
a^2 = (a * a)
a^3 = (a^2) * a
a^6 = (a^3) * (a^3)
a^12 = (a^6) * (a^6)
a^15 = (a^12) * (a^3)
Unfortunately, this is not a suitable general approach, since there are no efficient algorithms known for finding the minimal sequence of multiplications required (i.e., the minimal-length addition chain for the exponent). In computer science terms, this problem is NP-complete. Although there are heuristic algorithms available, I've found that even quality optimizing C compilers don't use them. Instead, they will just use the good old exponentiation-by-squaring algorithm to simplify chains of multiplications. For example:
unsigned int To15thPower(unsigned int a)
{
return (a * a * a *
a * a * a *
a * a * a *
a * a * a *
a * a * a);
}
would be compiled to code that required 6 multiplications:
To15thPower(unsigned int): ; parameter is in EAX
mov edx, eax
imul edx, eax
imul edx, eax
imul edx, edx
imul edx, eax
imul edx, edx
imul eax, edx
ret
even though, as we saw, it can actually be done in only 5. To get that optimization, you'd need to write the code to explicitly show the algorithm as it should be implemented:
unsigned int To15thPower_Optimized(unsigned int a)
{
unsigned int a2 = a * a;
unsigned int a3 = a2 * a;
unsigned int a6 = a3 * a3;
unsigned int a12 = a6 * a6;
return a12 * a3;
}
To15thPower_Optimized(unsigned int): ; parameter is in EAX
mov edx, eax
imul eax, edx
imul eax, edx
mov edx, eax
imul edx, eax
imul edx, edx
imul eax, edx
ret
Well, that was both complicated and fun. :-) Hopefully you learned something!