2

I am stuck with an exercise of ARM. The following program should calculate the result of 2((x-1)^2 + 1) but there is a mistake in the program that leads it into an infinite loop. I think that I still don't understand completely subroutines and for this reason I am not seeing where the mistake is.

_start:
 mov r0, #4 
 bl g
 mov r7, #1
 swi #0

f:
 mul r1, r0, r0
 add r0, r1, #1
 mov pc, lr

g: 
 sub r0, r0, #1
 bl f
 add r0, r0, r0
 mov pc, lr

The infinite loop starts in subroutine g: in the line of mov pc, lr and instead of returning to _start it goes to the previous line add r0, r0, r0 and then again to the last line of subroutine g:. So I guess that the problem is the last line of subroutine g: but I can't find the way to return to _start without using mov pc, lr. I mean, this should be the command used when we have a branch with link.

Also, in this case r0 = 4, so the result of the program should be 20.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 2
    [ARM frame and link pointers](https://stackoverflow.com/questions/15752188/arm-link-register-and-frame-pointer) give some explanation. Your `bl f` overwrites the 'lr' that has the return address to `_start`. When **g()** runs `mov pc,lr`, it is returning to the `add` instruction in 'g'. Happy new year. – artless noise Dec 31 '19 at 19:18
  • @artlessnoise thank you for answering and happy new year! :) –  Dec 31 '19 at 19:35

1 Answers1

3

This is because you don't save lr on the stack prior to calling f, and the initial return address was therefore lost: if you only have one level of subroutine calls, using lr without saving it is fine, but if you have more then one, you need to preserve the previous value of lr.

For example, when compiling this C example using Compiler Explorer with ARM gcc 4.56.4 (Linux), and options -mthumb -O0,

void f()
{
}

void g()
{
    f();
}

void start()
{
    g();
}

The generated code will be:

f():
        push    {r7, lr}
        add     r7, sp, #0
        mov     sp, r7
        pop     {r7, pc}
g():
        push    {r7, lr}
        add     r7, sp, #0
        bl      f()
        mov     sp, r7
        pop     {r7, pc}
start():
        push    {r7, lr}
        add     r7, sp, #0
        bl      g()
        mov     sp, r7
        pop     {r7, pc}

If you were running this on bare metal, not under Linux, you'd need your stack pointer to be initialized a correct value. Assuming you are running from RAM on a bare-metal system/simulator, you could setup a minimal stack of 128 bytes:

       .text
       .balign 8
_start:
        adr r0, . + 128  // set top of stack at _start + 128
        mov sp, r0
        ...

But it looks like you're writing a Linux executable that exits with a swi/r7=1 exit system call. So don't do that, it would make your program crash when it tries to write to the stack.

Frant
  • 5,382
  • 1
  • 16
  • 22
  • 1
    ARM [tag:linux] will initialize the stack for you. For an embedded micro-controller environment, this is a good point. I think the `swi` is a clue as to the OPs OS :-) – artless noise Dec 31 '19 at 19:40
  • 1
    I do agree 100%, and you were very polite by using the word `clue` :-) - an Aarch64 `HLT` (or a 6502 `BRK`...) would have more likely caught my eye... By the way, I have to tell that I am always learning something useful when I read your posts. Happy new year in the case this would apply. – Frant Dec 31 '19 at 20:24
  • 1
    You should probably edit your answer; setting SP to `_start+128` under Linux will lead to a segfault because the `.text` section will be loaded/mapped into a read-only page. So that part will be very confusing for a beginner. Actually I'll edit it for you. – Peter Cordes Jan 01 '20 at 04:48
  • 1
    I had a bare-metal system/qemu target in mind, not Linux at all for the exact reason you mentioned, but I agree this was not clear enough in my original wording - thanks for the edit. – Frant Jan 01 '20 at 09:40