loRecursion Example in ARM Assembly

Question

Can someone give me an example of how recursion would be done in ARM Assembly with only the instructions listed here (for visUAL)?

I am trying to do a recursive fibonacci and factorial function for class. I know recursion is a function that calls a function, but I have no idea how to simulate that in ARM.

https://salmanarif.bitbucket.io/visual/supported_instructions.html

In case the link doesn't work, I am using visUAL and these are the only instructions I can use:

MOV
MVN
ADR
LDR
ADD
ADC
SUB
SBC
RSB
RSC
AND
EOR
BIC
ORR
LSL
LSR
ASR
ROR
RRX
CMP
CMN
TST
TEQ
LDR
LDM
STM
B
BL
FILL
END

This doesn't load an older value for R4, so R4 just doubles every time the function calls itself.

    ;VisUAL initializess all registers to 0 except for R13/SP, which is -16777216

    MOV     R4, #0
    MOV     R5, #1

    MOV     r0, #4

    MOV     LR, #16             ;tells program to move to 4th instruction


FIB


    STMDB   SP!, {R4-R6, LR}    ;Stores necessary values on stack (PUSH command)
    LDR     R4, [SP]            ;Loads older value for R4 from memory
    ADD     R4, R4, R5          ;Adds R5 to R4
    STR     R4, [SP], #8        ;stores current value for R4 to memory
    MOV     R5, R4              ;Makes R5 = R4


    CMP     R4, #144            ;If R4 >= 144:
    BGE     POP                 ;Branch to POP

    MOV     PC, LR              ;Moves to STMDB(PUSH) statement

POP
    LDMIA   SP!, {R4-R6, LR}    ;Pops registers off stack
    END                         ;ends program

Simulate calling yourself? `BL yourself`. Save and restore `LR` around it. Easy. Show pseudocode (or C) for what you are trying to do, and [what you attempted](http://idownvotedbecau.se/noattempt/) and where you got stuck. — Jester, Oct 20 '17 at 23:17
write a very simple recursion example in C like five lines or so total. compile and disassemble and see what is produced. If you follow the C EABI then you could just simply write the code without doing that, just bang it out. — old_timer, Oct 20 '17 at 23:50
@Jester Sorry, I've added what I've done so far. I don't know C or any high-level languages other than Python 3. — LuminousNutria, Oct 22 '17 at 01:23
Single-step your code in a debugger so you can see how register values are changing. Your `stmdb` / `ldmia` instructions look correct, pushing some regs + LR and popping the same regs + PC to return. I don't think your return value is correct, though. I think you're updating a counter that you clobber. Comment your code; I don't know how you *want* it to work, just that it doesn't. (But really, single-stepping with a debugger should sort you out.) — Peter Cordes, Oct 22 '17 at 02:07
That looks not far from something that would work. Remember to initialize SP to point to the top of your "stack segment" if your simulator doesn't already do that; and if there's no calling function then you need some initial code to set up R0 (start with small values) then call BL FIB then make the simulation terminate somehow. Other details: You'll need another BL FIB call for fib(n-1). The MOV R0, #1 should be conditional on n<2 otherwise the function can only ever return 1 (POP is where it returns from). — NickJH, Oct 22 '17 at 08:57
@Nick I've updated my code. I'm still really lost here. I think I've got an idea how to do it, but it isn't working. — LuminousNutria, Oct 22 '17 at 22:44
@Peter Thanks! I've updated my code, and added comments. I'm not sure how to use LR and PC or what logic is underneath them. — LuminousNutria, Oct 22 '17 at 22:59
Need to rewind a bit, I think.You need to start by knowing how to call a function (search the web for "ARM function calling convention", don't worry about the different variants). It might help to write a Python equivalent exactly of what you want to do, keeping each line as simple as possible (every function call or arithmetic operator on its own line); then re-write it in assembler (Python isn't quite as good as "C" for this, as indentation changes are hard to "transcribe" but it may help). — NickJH, Oct 23 '17 at 09:13

NickJH · Accepted Answer · 2017-10-23T11:28:12.393

3

You need to use the stack, STMDB and LDMIA instructions. On real ARM tools with "unified" notation, they also have mnemonics PUSH and POP.

Fibonnaci and factorial are not great examples as they don't "need" recursion. But let's pretend they do. I'll pick Fibonacci as you don't have a MUL instruction!? You want to do something like this:

START
   MOV R0, #6
   BL FIB
   END ; pseudo-instruction to make your simulator terminate

FIB                                 ; int fib(int i) {
   STMDB SP!, {R4,R5,R6,LR}         ;   int n, tmp;
   MOV R4, R0                       ;   n = i;
   CMP R0, #2                       ;   if (i <= 2) {
   MOV R0, #1                       ;     return 1;
   BLE FIB_END                      ;   }
   SUB R0, R4, #2                   ;   i = n-2;
   BL FIB                           ;   i = fib(i);
   MOV R5, R0                       ;   tmp = i;
   SUB R0, R4, #1                   ;   i = n-1;
   BL FIB                           ;   i = fib(i);
   ADD R0, R0, R5                   ;   i = i + tmp;
FIB_END                             ;   return i;
   LDMIA SP!, {R4,R5,R6,PC}         ;  }

It should terminate with R0 containing fib(6) == 8. Of course this code is very inefficient as it repeatedly calls FIB for the same values.

The STM is needed so you can use registers r4,r5 because another function call can change r0-r3 and LR. Pushing LR and popping PC is like B LR. If you were calling C code you should push an even number of registers to keep SP 64-bit aligned (we don't really need to do that here; ignore R6).

edited Oct 23 '17 at 11:28

answered Oct 21 '17 at 10:39

NickJH

561
3
7

Fibonacci and Factorial always bothered me as recursion examples, too. Especially in asm, it would be easy to miss the point and not actually save/restore your locals, but instead end up implementing a regular loop with a bunch of returning at the end. (especially on x86 where `call` and `ret` already use the stack, unlike RISC machines where you manually save LR in non-leaf functions.) Really what bugs me is that recursion is obviously less efficient, especially for Fib. IMO binary-tree traversal would be a better example, except then you need a data structure. – Peter Cordes Oct 21 '17 at 11:40
@PeterCordes Now I'm trying to think of a nice simple recursive function that really needs stack context and can't be optimized to iteration or tail-call. I can't think of one yet... – NickJH Oct 21 '17 at 18:07
Thank you! I've tried to do what you said, but I'm still really lost. I don't know what I'm doing wrong, but I think it has to do with the STMDB and LDMIA instructions. I've never done anything with stacks before, so this is all new to me. – LuminousNutria Oct 22 '17 at 01:25
@NickHollinghurst: An in-order tree traversal would still be my go-to. Instead of calling a `print` function, just append elements to an array in-order. As a homework assignment, the tree could be provided as static constant data (`dd` directives). – Peter Cordes Oct 22 '17 at 01:45
Non-data-structure alternatives: [The Ackermann function](https://en.wikipedia.org/wiki/Ackermann_function) (computed modulo 2^32 I guess :). I'm not up on my computability theory, but it's not "primitive recursive". I think that means you can't trivially turn it into a loop. This [iterative implementation](https://stackoverflow.com/questions/10742322/how-to-rewrite-ackermann-function-in-non-recursive-style) uses a stack data structure, so a recursive implementation is probably just as straightforward, or actually simpler. – Peter Cordes Oct 22 '17 at 01:46
I'm clueless about the stack and LDR, LDM, STR, and STM Instructions. I've learned that much. I'm also not completely sure what recursion is. It seems like a loop but instead of declaring a loop specifically you just call the function over and over again. I'm not sure what the difference is in ARM. – LuminousNutria Oct 22 '17 at 22:52
yes recursion is calling the function from the function. – old_timer Oct 23 '17 at 01:56
@RidiculousName Although you said you didn't want a ready-made answer, I have made one so you can see it working (I hope it's correct for VisUAL; I didn't actually try it...) – NickJH Oct 23 '17 at 11:08
@Nick Thank you for your patience with this. Unfortunately, your code doesn't work. R0 just becomes 8. I've already spent 24+ hours on this project. My professor has extended my class's deadline for this work three times now.. – LuminousNutria Oct 24 '17 at 21:05

old_timer · Answer 2 · 2017-10-23T02:43:17.030

some other recursive function:

unsigned int so ( unsigned int x )
{
    static unsigned int z=0;
    z+=x;
    if(x==0) return(z);
    so(x-1);
    return(z);
}

build/disassemble

arm-none-eabi-gcc -O2 -c Desktop/so.c -o so.o
arm-none-eabi-objdump -D so.o


00000000 <so>:
   0:   e92d4010    push    {r4, lr}
   4:   e59f4034    ldr r4, [pc, #52]   ; 40 <so+0x40>
   8:   e5943000    ldr r3, [r4]
   c:   e3500000    cmp r0, #0
  10:   e0803003    add r3, r0, r3
  14:   e5843000    str r3, [r4]
  18:   1a000002    bne 28 <so+0x28>
  1c:   e1a00003    mov r0, r3
  20:   e8bd4010    pop {r4, lr}
  24:   e12fff1e    bx  lr
  28:   e2400001    sub r0, r0, #1
  2c:   ebfffffe    bl  0 <so>
  30:   e5943000    ldr r3, [r4]
  34:   e8bd4010    pop {r4, lr}
  38:   e1a00003    mov r0, r3
  3c:   e12fff1e    bx  lr
  40:   00000000

If you dont understand it then is it worth it. Is it cheating to let a tool do it for you?

push is a pseudo instruction for stm, pop a pseudo instruction for ldm, so you can use those.

I used a static local which I call a local global, it lands in .data not on the stack (well .bss in this case as I made it zero)

Disassembly of section .bss:

00000000 <z.4099>:
   0:   00000000

the first to loads are loading this value into r3.

the calling convention says that r0 will contain the first parameter on entry into the function (there are exceptions, but it is true in this case).

so we go and get z from memory, r0 already has the parameter x so we add x to z and save it to memory

the compiler did the compare out of order for who knows performance reasons, the add and str as written dont modify flags so that is okay,

if x is not equal to zero it branches to 28 which does the so(x-1) call reads r3 back from memory (the calling convention says that r0-r3 are volatile a function you can can modify them at will and doesnt have to preserve them so our version of z in r3 might have been destroyed but r4 is preserved by any callee, so we read z back into r3. we pop r4 and the return address off the stack, we prepare the return register r0 with z and do the return.

if x was equal to zero (bne on 18 failed we run 1c, then 20, then 24) then we copy z (r3 version) into r0 which is the register used for returning from this function per the calling convention used by this compiler (arms recommendation). and returns.

the linker is going to fill in the address of z to the offset 0x40, this is an object not a final binary...

arm-none-eabi-ld -Ttext=0x1000 -Tbss=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00001000 <so>:
    1000:   e92d4010    push    {r4, lr}
    1004:   e59f4034    ldr r4, [pc, #52]   ; 1040 <so+0x40>
    1008:   e5943000    ldr r3, [r4]
    100c:   e3500000    cmp r0, #0
    1010:   e0803003    add r3, r0, r3
    1014:   e5843000    str r3, [r4]
    1018:   1a000002    bne 1028 <so+0x28>
    101c:   e1a00003    mov r0, r3
    1020:   e8bd4010    pop {r4, lr}
    1024:   e12fff1e    bx  lr
    1028:   e2400001    sub r0, r0, #1
    102c:   ebfffff3    bl  1000 <so>
    1030:   e5943000    ldr r3, [r4]
    1034:   e8bd4010    pop {r4, lr}
    1038:   e1a00003    mov r0, r3
    103c:   e12fff1e    bx  lr
    1040:   00002000    

Disassembly of section .bss:

00002000 <z.4099>:
    2000:   00000000

the point here is not to cheat and use a compiler, the point here is there is nothing magical about a recursive function, certainly not if you follow a calling convention or whatever your favorite term is.

for example

if you have parameters r0 is first, r1 second, up to r3 (if they fit, make your code such that it does and you have four or less parameters) the return value is in r0 if it fits you need to push lr on the stack as you will be calling another function r4 on up preserve if you need to modify them, if you want some local storage either use the stack by modifying the stack pointer accordingly (or doing pushes/stms). you can see that gcc instead saves what was in the register to the stack and then uses the register during the function, at least up to a few local variables worth, beyond that it would need to bang on the stack a lot, sp relative. when you do the recursive call you do so as you would any other normal function according to the calling convention, if you need to save r0-r3 before calling then do so either in a register r4 or above or on the stack, restore after the function returns. you can see it is easier just to put the values you want to keep before and after a function call in a register r4 or above. the compiler could have done the compare of r0 just before the branch, reads easier that way. Likewise could have done the mov to r0 of the return value before the pop

I didnt put parameters, so my build of gcc here appears to be armv4t, if I ask for something a little newer

arm-none-eabi-gcc -O2 -c -mcpu=mpcore Desktop/so.c -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <so>:
   0:   e92d4010    push    {r4, lr}
   4:   e59f402c    ldr r4, [pc, #44]   ; 38 <so+0x38>
   8:   e3500000    cmp r0, #0
   c:   e5943000    ldr r3, [r4]
  10:   e0803003    add r3, r0, r3
  14:   e5843000    str r3, [r4]
  18:   1a000001    bne 24 <so+0x24>
  1c:   e1a00003    mov r0, r3
  20:   e8bd8010    pop {r4, pc}
  24:   e2400001    sub r0, r0, #1
  28:   ebfffffe    bl  0 <so>
  2c:   e5943000    ldr r3, [r4]
  30:   e1a00003    mov r0, r3
  34:   e8bd8010    pop {r4, pc}
  38:   00000000

You can see the returns read a little easier

although an optimization was missed it could have done an ldr r0,[r4] and saved an instruction. or leave that tail end as is and the bne could have been a beq to 30 (mov r0,r3; pop{r4,pc} and shared an exit.

a little more readable

so:
    push    {r4, lr}
    @ z += x
    ldr r4, zptr
    ldr r3, [r4]
    add r3, r0, r3
    str r3, [r4]
    @ if x==0 return z
    cmp r0, #0
    beq l30
    @ so(x - 1)
    sub r0, r0, #1
    bl so
    ldr r3, [r4]
l30:
    @ return z
    mov r0, r3
    pop {r4, pc}
zptr: .word z

.section .bss
z: .word 0

arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <so>:
   0:   e92d4010    push    {r4, lr}  (stmdb)
   4:   e59f4024    ldr r4, [pc, #36]   ; 30 <zptr>
   8:   e5943000    ldr r3, [r4]
   c:   e0803003    add r3, r0, r3
  10:   e5843000    str r3, [r4]
  14:   e3500000    cmp r0, #0
  18:   0a000002    beq 28 <l30>
  1c:   e2400001    sub r0, r0, #1
  20:   ebfffff6    bl  0 <so>
  24:   e5943000    ldr r3, [r4]

00000028 <l30>:
  28:   e1a00003    mov r0, r3
  2c:   e8bd8010    pop {r4, pc}  (ldmia)

00000030 <zptr>:
  30:   00000000    

Disassembly of section .bss:

00000000 <z>:
   0:   00000000

EDIT

So lets walk through this last one.

push {r4,lr}  which is a pseudo instruction for stmdb sp!,{r4,lr}

Lr is the r14 which is the return address look at the bl instruction
branch and link, so we branch to some address but lr (link register) is 
set to the return address, the instruction after the bl.  So when main or some other function calls so(4);  lets assume so is at address 0x1000 so the program counter, r15, pc gets 0x1000, lr will get the value of the instruction after the caller so lets say that is 0x708.  Lets also assume the stack pointer during this first call to so() from main is at 0x8000, and lets say that .bss is at 0x2000 so z lives at address 0x2000 (which also means the value at 0x1030, zptr is 0x2000.

We enter the function for the first time with r0 (x) = 4.

When you read the arm docs for stmdb sp!,{r4,lr} it decrements before (db)  so sp on entry this time is 0x8000 so it decrements for the two items to 0x7FF8, the first item in the list is written there so

0x7FF8 = r4 from main
0x7FFC = 9x 0x708 return address to main

the ! means sp stays modified so sp-0x7ff8

then ldr r4,zptr  r4 = 0x2000
ldr r3,[r4] this is an indirect load so what is at address r4 is read to 
put in r3 so r3 = [0x2000] = 0x0000 at this point  the z variable.

z+=x;  add r3,r0,r3  r3 = r0 + r3 = 4 + 0 = 4
str r3,[r4]  [r4] = r3, [0x2000] = r3 write 4 to 0x2000

cmp r0,#0   4 != 0

beq to 28 nope, not equal so no branch

sub r0,r0,#1   r0 = 4 - 1 = 3

bl so  so this is so(3); pc = 0x1000 lr = 0x1024

so now we enter so for the second time with r0 = 3

stmdb sp!,{r4,lr}

0x7FF0 = r4 (saving from so(4) call but we dont care its value even though we know it)
0x7FF4 = lr from so(4) = 0x1024
sp=0x7FF0
ldr r4,zptr r4 = 0x2000
ldr r3,[r4] r3 = [0x2000] = 4
add r3,r0,r3  r3 = 3 + 4 = 7
str r3,[r4]  write 7 to 0x2000
cmp r0,#0 3 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1   r0 = 3-1 = 2
bl so  pc=0x1000 lr=0x1024

so(2)

stmdb sp!,{r4,lr}
0x7FE8 = r4 from caller, just save it
0x7FEC = lr from caller, 0x1024
sp=0x7FE8
ldr r4,zprt  r4=0x2000
ldr r3,[r4]  r3 = read 7 from 0x2000
add r3,r0,r3  r3 = 2 + 7 = 9
str r3,[r4]  write 9 to 0x2000
cmp r0,#0  2 != 0
beq 0x1028  not equal so dont branch
sub r0,r0,#1  r0 = 2 - 1 = 1
bl 0x1000 pc=0x1000 lr=0x1024

so(1)

stmdb sp!,{r4,lr}
0x7FE0 = save r4
0x7FE4 = lr = 0x1024
sp=0x7FE0
ldr r4,zptr r4=0x2000
ldr r3,[r4]  r3 = read 9 from 0x2000
add r3,r0,r3  r3 = 1 + 9 = 10
str r3,[r4]  write 10 to 0x2000
cmp r0,#0  1 != 0
beq 0x1028  not equal so dont branch
sub r0,r0,#1  r0 = 1 - 1 = 0
bl 0x1000  pc=0x1000 lr=0x1024

so(0)

stmdb sp!,{r4,lr}
0x7FD8 = r4
0x7FDC = lr = 0x1024
sp = 0x7FD8
ldr r4,zptr  r4 = 0x2000
ldr r3,[r4]  r3 = read 10 from 0x2000
add r3,r0,r3  r3 = 0 + 10 = 10
str r0,[r4]  write 10 to 0x2000
cmp r0,#0  0 = 0  so it matches
beq 0x1028 it is equal so we finally take this branch
mov r0,r3  r0 = 10
ldmia sp!,{r4,pc}
increment after
r4 = [sp+0] = [0x7FD8] restore r4 from caller
pc = [sp+4] = [0x7FDC] = 0x1024
sp += 8 = 0x7FE0
(branch to 0x1024)(return from so(0) to so(1))
ldr r3,[r4]  read 10 from 0x2000
mov r0,r3  r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FE0] restore r4 from caller
pc = [sp+4] = [0x7FE4] = 0x1024
sp += 8 = 0x7FE8
(branch to 0x1024)(return from so(1) to so(2))
ldr r3,[r4]  read 10 from 0x2000
mov r0,r3  r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FE8] restore r4 from caller
pc = [sp+4] = [0x7FEC] = 0x1024
sp += 8 = 0x7FF0
(branch to 0x1024)(return from so(2) to so(3))
ldr r3,[r4]  read 10 from 0x2000
mov r0,r3  r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FF0] restore r4 from caller
pc = [sp+4] = [0x7FF4] = 0x1024
sp += 8 = 0x7FF8
(branch to 0x1024)(return from so(3) to so(4))
ldr r3,[r4]  read 10 from 0x2000
mov r0,r3  r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FF8] restore r4 from caller (main()'s r4)
pc = [sp+4] = [0x7FFC] = 0x708
sp += 8 = 0x8000
(branch to 0x708)(return from so(4) to main())

and we are done.

A stack is like a dixie cup holder which might be before your time. A cup holder where you pull a cup down and the next and rest of the cups stay in the holder, well you can shove one back up in there.

So a stack is temporary storage for the function, write one data item on the cup, then shove it up into the holder (save r4 from caller) write another item and shove it up into the holder (lr, return address from caller). we only used two items per function here, so each function I can push two cups up into the holder, each call of the function I get two NEW AND UNIQUE storage locations to store this local information. As I exit the function I pull the two cups down out of the holder and use their values (and discard them). This is to some extent the key to recursion, the stack gives you new local storage for each call, separate from prior calls to the same function, if nothing else you need a return address (although did make some even simpler recursion example that didnt when optimized was smart enough to make a loop out of it basically).

ldr rd,[rn] think of he brakets as saying the item at that address so read memory at the address in rn and save that value in rd.

str rd,[rn] the one messed up arm instruction as the rest the first parameter is the left side of the equals (add r1,r2,r3 r1 = r2 + r3, ldr r1,[r4] r1 = [r4]) this one is backward [rn] = rd store the value in rd to the memory location described by the address r4, one level of indirection.

stmdb sp!, means decrement the stack pointer before doing anything 4 bytes times the number of registers in the list, then write the first, lowest numbered register to [sp+0], then next to [sp+4] and so on the last one will be four less than the starting value of sp. The ! means the function finishes with sp being that decremented value. You can use ldm/stm for things other than stack pushes and pops. Like memcpy,but that is another story...

All of this is in the arm documentation from infocenter.arm.com which you should already have (arm architectural reference manual, armv5 is the preferred first one if you have not read one).

Thank you, but this includes multiple instructions I can't use. A lot of what you were talking about is way over my head. — LuminousNutria, Oct 22 '17 at 22:47
Nope you are 100% clean all of these instructions are in your list, stm, ldr add, cmp b[cond], sub, bl, mov, ldm. And for the bx lr you just substitute that with mov pc, lr which is in your list. — old_timer, Oct 22 '17 at 23:43
@RidiculousName: push and pop are aliases for `stmdb` and `ldmia` using `SP!`. `andeq` is an AND instruction, it just has the conditional-execution bits set to the EQ condition instead of always-true. (Also, it doesn't really run, it's just the disassembly of the all-zero data / padding). — Peter Cordes, Oct 22 '17 at 23:43
andeq is not an instruction it is data that got disassembled. edited them out — old_timer, Oct 22 '17 at 23:44
It's instructions like "bne 24 " that I can't use. VisUAL won't handle anything in "<" or ">" signs. — LuminousNutria, Oct 23 '17 at 03:05
b[cond] is on the list of supported instructions, bne and beq are both b[cond] instructions — old_timer, Oct 23 '17 at 04:15
the greater or less than signs are just part of the disassembly is that not making sense to you? this is a disassembly, with some helpful information. I imagine the 18: 1a000001 isnt supported as written either — old_timer, Oct 23 '17 at 04:16
when you see bne 24 in that disassembly, replace that with a label for 24 and bne label as shown in the last example. — old_timer, Oct 23 '17 at 04:18
you are not quite ready to write your recursive code. you need to work through several/many simple programs to learn each of the instructions one at a time, then you can start to combine them into a program. — old_timer, Oct 23 '17 at 04:20
@old_timer I agree that I needed to work through many simpler programs first, it's just that my professor apparently doesn't think so. :/ — LuminousNutria, Oct 24 '17 at 20:57
well it may take you longer to do it that way, but whatever. Its been two days since you posted your question, more than enough time to have learned the instructions and how they work and then assemble them into a program. — old_timer, Oct 25 '17 at 00:27

loRecursion Example in ARM Assembly

2 Answers2