0

I am having a bit of trouble writing a recursive function in assembly. I would like to model the C function here:

int power(int base, int exponent)
{
    if (exponent==0)
        return 1;
    else {
        /* return base * power(base, --exponent); -- in normal form, writing in long-form below*/
        exponent--;
        int tmp1 = power(base, exponent);
        int tmp2 = base * tmp1;
        return tmp2;
    }
}

And what I have so far in assembly is as follows:

power:
    // int power(int base=rdi, int exponent=rsi)
    push %rbp
    mov %rsp, %rbp

    // if (exponent==0) return 1
    cmp $0, %rsi
    jnz exponentiate

 return_one:
    mov $1, %eax
    jmp cleanup

 exponentiate:
    dec %rsi
    push %rax
    call power
    pop %rbx
    imul %rbx, %rax

 cleanup:
    mov %rbp, %rsp
    pop %rbp
    ret

This produces the result zero. I think my error is in the exponentiate function, but am having a hard time debugging it. I know I can 'find the answer' using Compiler Explorer, but I'm wondering if someone could help point out the mistake I have in my code.


I've been able to simplify this to the following where I don't even have to set up a stack frame for the recursive case:

 exponentiate:
    dec %rsi
    call power
    imul %rdi, %rax

Why is this allowed? I thought for all recursive functions you need to set up its own stack frame, but why isn't it necessary in the above case?

samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58
  • 2
    The first time you do `push %rax`, what value do you expect `%rax` to hold? – Joseph Sible-Reinstate Monica Mar 25 '21 at 03:56
  • @JosephSible-ReinstateMonica I think 1 if that's the return from the `return_one` ? – samuelbrody1249 Mar 25 '21 at 03:57
  • What order do you think your code runs in? – Joseph Sible-Reinstate Monica Mar 25 '21 at 03:58
  • @JosephSible-ReinstateMonica I see, the push should go after the `call power`, right? – samuelbrody1249 Mar 25 '21 at 03:58
  • Have you tried comparing GCC `-O1` output against your code, maybe with something like `-fno-inline-functions` or `__attribute__((noinline,noclone))` to try to stop it from turning recursion into iteration without disabling optimization entirely? Often instructive to see how it chooses to do things, and see if that's different from your overall strategy in any interesting ways, or just a different choice of regs. – Peter Cordes Mar 25 '21 at 04:03
  • @JosephSible-ReinstateMonica ok thanks for your suggestion, I simplified it a bit and updated my question. – samuelbrody1249 Mar 25 '21 at 04:15
  • @PeterCordes I did a bit, but thought it was a bit verbose in how it was set up: https://godbolt.org/z/s19crnjo4. So I didn't find it too helpful. Also, it seems to "know" beforehand how much to move the stack down by (well of course, because it's a compiler), which I find confusing sometimes trying to figure out why they're using those numbers -- whether for alignment, or which variables, or what other considerations it may have. – samuelbrody1249 Mar 25 '21 at 04:21
  • 1
    [Of course it is](https://stackoverflow.com/questions/53366394/why-does-clang-produce-inefficient-asm-with-o0-for-this-simple-floating-point), you ignored everything I said about using `-O1`, which makes nice clean asm in this case that only does one push, no other stack space. https://godbolt.org/z/EKbcTf15a. Turns out you don't need any `noinline` tricks at `-O1`, that alone works to make recursive asm without converting it to iterative. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) – Peter Cordes Mar 25 '21 at 04:30
  • 1
    re: stack alignment or not: yes [GCC sometimes reserves 16 bytes more than it needs for alignment](https://stackoverflow.com/questions/63009070/why-does-gcc-allocate-more-space-than-necessary-on-the-stack-beyond-whats-need). But that's not really important, the fact that it reserves *some* and uses it is all that actually matters. When you write your own function, you can do your own layout of your stack frame and choices of what to have in registers when. Use `-fverbose-asm` to have it comment its asm with var names. – Peter Cordes Mar 25 '21 at 04:34

2 Answers2

4

Here is the cleanest conversion I could come up with for keeping the recursive case:

// int power(int base=rdi, int exponent=rsi)
power:
    cmp $0, %esi                  # if (exponent==0)
    jnz exponentiate                # // 
    mov $1, %eax                    # tmp = 1
    ret                             # return tmp
  exponentiate:                  # else
    dec %esi                       # exponent--
    call power                     # int tmp = power(base, exponent)
    imul %edi, %eax                # tmp = tmp * base
    ret                            # return tmp

It ends up looking relatively close to the C code you have.

samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58
  • Ha, very close to mine. You save one instruction on the common path by keeping `mov $1, %eax` conditional, but I save one byte by only having `ret` in one place :-) – Nate Eldredge Mar 25 '21 at 04:41
  • @NateEldredge lol the only difference between yours and mine is mine took five hours to come up with ;) – samuelbrody1249 Mar 25 '21 at 04:42
  • Oh, I see you had already mentioned the version without saving `%edi` in your question... I didn't mean to plagiarize it, I actually just didn't read that part of your question carefully. But I've edited now to point out that it was really your idea first. – Nate Eldredge Mar 25 '21 at 04:50
  • @NateEldredge oh no worries, actually I had one version and then I used your suggestions to go from `rxx` to `exx` on things like `cmp $0 %esi`. Thanks again for your help! – samuelbrody1249 Mar 25 '21 at 04:55
  • @NateEldredge by the way, to practice recursion: what might be some sample functions that I could try and write where I'd need to create a stack frame? For example, this one is one where I do not... – samuelbrody1249 Mar 25 '21 at 05:12
  • 1
    @samuelbrody1249: [What are good examples that actually motivate the study of recursion?](https://cseducators.stackexchange.com/a/4361) (and other answers on that question) have some good examples where it wouldn't be easier just to write a loop. In this question, you're already half-way to making a loop by taking advantage of `power` not modifying EDI. It's arguably still pure recursion, just with a calling convention where RDI is call-preserved, and obviously it would be easier to write iteratively. – Peter Cordes Mar 25 '21 at 05:23
  • Doing this more efficiently with log2(exp) multiplies instead of exp, using right shifts, *could* be done recursively, but would be easier iteratively. https://youtu.be/Cege-OAuD1k?t=2070 shows converting your algorithm into the O(log2()) version using recursion (which in a later lecture he converts to tail-recursion). The algorithm for multiply-in-terms-of-add is the same as for exponentiation-in-terms-of-multiply, which is another point Alexander Stepanov makes in those a later lecture. (Some pretty easy algorithm stuff, but interesting and well-presented.) – Peter Cordes Mar 25 '21 at 05:42
3

You want to multiply the return value of the recursive call by the value of base. That's in %edi. For some reason, in your original version, you push %rax even though it only contains garbage on (the first) entry to the function. (The mov $1, %eax is followed by a jump to cleanup so it does not run before exponentiate.) And you pop into %rbx, which is a call-preserved register; if you want to modify it you must save its value and restore it later.

The more natural thing to do would be to simply push and pop %rdi around the call, since it's what contains the value you'll need afterwards. However, as you eventually noticed, you don't actually have any need to modify base, so you might as well just keep it intact, and think of %rdi as being "read only" throughout your function. Then you don't need to use the stack at all, except implicitly for the return address. (Here you can beat the compiler which I guess has to assume that %rdi is clobbered by any call, including a call to the very function it's compiling, even though it knows its own register usage...)

Thus you can indeed write the exponentiate section as:

exponentiate:
    dec %esi
    call power
    imul %edi, %eax

(You're right to use 32-bit registers because you have everything declared as int in your C example. If you want it to work on int64_t instead then feel free to change back to 64-bit registers.)

Other notes:

  • The stack frame setup is never really needed, especially not when you don't need the stack for anything at all. In particular there is no such rule that it's required for recursive functions (if you think you'll see it would not do anything). You can skip it. The worst that will happen is that a debugger will have trouble giving you tracebacks.

  • If using 32-bit values, you want to compare %esi against zero, not %rsi; it also saves a byte of REX prefix. And you can save another byte by writing test %esi, %esi in place of cmp $0, %esi since all you need to look at is the zero flag.

  • You can simplify the jumps a little bit by doing

power:
    mov $1, %eax
    test %esi, %esi
    jz cleanup
    // former label exponentiate was here, no longer needed
    dec %esi
    call power
    imul %edi, %eax
cleanup:
    ret

If you run the exponentiate section it will just overwrite the 1 that was in eax, so no harm done. Running a very cheap mov instruction unconditionally is probably better than taking an extra jump.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • 1
    *since all you need to look at is the zero flag.* - `test reg,reg` sets *all* flags identically to `cmp $0, reg`, except for possibly AF, the half-carry flag which you can only read via `pushf`. [Test whether a register is zero with CMP reg,0 vs OR reg,reg?](https://stackoverflow.com/a/33724806). So it doesn't matter what condition you want to check, you can always use `test` to compare a register against zero. – Peter Cordes Mar 25 '21 at 04:57
  • Taking advantage of the fact that `power` leaves RDI unmodified is half way into turning the recursion into iteration. In some ways that's good because recursion is 100% pointless for this function, making everything more complicated for zero benefit. (Unlike a tree traversal, or Ackermann, where it's actually natural and easier than the alternative, which would involve a manual stack data structure - [What are good examples that actually motivate the study of recursion?](https://cseducators.stackexchange.com/a/4361)) – Peter Cordes Mar 25 '21 at 05:03