Cortex M0+ ARM Assembly - How to implement a loop position independent

Question

I am working on an Arm Cortex M0+ STM32 Nucleo Board and use Keil MDK version 5.36. Heads up - I have embedded background but I am new to ARM assembly magic and in the process of learning it.

The challenge: I would like to copy the bytecode from some lines of assembly code into RAM while executing some other application and execute the code in RAM by branching to it.

Now I am stuck to implement the loop as position independent code, so that it will work after it was copied to an "random" address in RAM.

This is the Code - It includes the whole Code-Framework to test it. The relevant piece of code I would like to copy to RAM is the "copy_loop"

Stack   EQU 0x00000100  ;Define Stacksize of 256 Bytes
        AREA    STACK, NOINIT, READWRITE, ALIGN=3

StackMem    SPACE   Stack
    
        AREA    RESET,DATA, READONLY
        EXPORT __Vectors
    
__Vectors
        DCD StackMem+ Stack
        DCD Reset_Handler
        ALIGN

        AREA simpleProject, CODE, READONLY, ALIGN=2
        ENTRY
        EXPORT Reset_Handler

Reset_Handler
        LDR r0, =0x00000000 ; Source Address
        LDR r1, =0x20000300 ; Destination address
        LDR r2, =100    ;number of bytes to copy

copy_loop   LDRB    r3, [r0]    ;read 1 byte
            ADDS    r0, r0, #1  ;increment source pointer
            STRB    r3, [r1]    ; write 1 Byte
            ADDS    r1, r1, #1  ; increment destination pointer
            subs    r2, r2, #1  ;decrement loop counter
            BNE     copy_loop   ;loop untill all data copied
        END

Running in the Debugger/Dissassmbler I see, that the conditional jump is realized with the absolute address.

28:                         BNE          copy_loop    ;loop untill all data copied 
0x08000018 D1F9      BNE      0x0800000E

How can I get it into a position independent conditional jump (with the M0+ instruction set), so that it will run from any position it is copied to. Really appreciate your help! Have been reading tons of stuff, but miss the HEUREKA moment.

No, it isn't. That is just a service of your disassembler. You can see it can't possibly fit into the two byte machine code `D1F9`. It's relative. — Jester, May 10 '23 at 20:30
`LDR r1, =0x20000300` is an absolute address; do you also want to make that relative to the code's position? Also, copying only 1 byte at a time is 4x slower than is needs to be. And you could save instructions with post-increment addressing modes. — Peter Cordes, May 10 '23 at 20:39
@jester or all: As a follow up question: How can I decode the specifics of the relative jump. From the M0+ TRM I get the encoding 1101(B) xxxx(cond) yyyyyyyy (imm8). For D1F9 yyyyyyyy=11111001? — Anderle, May 10 '23 at 20:47
@Peter Cordes: Thanks for the optimization/speed tipp, I will read into that! Regarding the absolute address, that is fine for now: This way I know/can define where the data is copied to. — Anderle, May 10 '23 at 20:53
[This post](https://stackoverflow.com/questions/58902628/what-are-data-segment-initializers/71849963#71849963) may interest you. It has alternate copy routines. Generally you would want to copy half words or words (32bit). Also, you can auto increment the addresses. But to the main topic, the majority of arm assembler is PC relative. It is absolutes addresses with `ldr =0x....` that the main source of absolute references. Also, the [difference between `adr` for labels](https://stackoverflow.com/questions/15774581/getting-an-label-address-to-a-register-on-arm/15775249#15775249). — artless noise, May 10 '23 at 20:57
@artlessnoise: New learning path to follow :-) - really appreciate your help! THX! — Anderle, May 10 '23 at 21:01
Yes, `F9` is the distance to jump. It's `-7` in two's complement which means `-14` bytes. Due to how PC works, you need to adjust that by 4 so the jump is actually `-10` bytes. `0x08000018 - 10 = 0x0800000E`. — Jester, May 10 '23 at 22:12

score 2 · Accepted Answer · answered May 10 '23 at 22:41

All you need to do is read the instruction documentation to see that it is strictly a pc relative offset.

or just try it

.thumb
lab0: nop; nop; nop; bne lab0
lab1: nop; nop; nop; bne lab1
lab2: nop; nop; nop; bne lab2
lab3: nop; nop; nop; bne lab3
lab4: nop; nop; nop; bne lab4
lab5: nop; nop; nop; bne lab5
lab6: nop; nop; nop; bne lab6

arm-none-eabi-objdump -d so.o | grep bne
   6:   d1fb        bne.n   0 <lab0>
   e:   d1fb        bne.n   8 <lab1>
  16:   d1fb        bne.n   10 <lab2>
  1e:   d1fb        bne.n   18 <lab3>
  26:   d1fb        bne.n   20 <lab4>
  2e:   d1fb        bne.n   28 <lab5>
  36:   d1fb        bne.n   30 <lab6>

position indepedent.

cortex-m0+ is not an instruction set it is an IP product. When you looked at the technical reference manual for the cortex-m0+ it says arv6-m and you can then get the architectural reference manual for armv6-m. In this case this instruction goes all the way back to the start of thumb, so any of the architectural reference manuals, full sized or other (not 64 bit) has this instruction.

Thx for the simple but powerful example! – Anderle May 11 '23 at 19:47 — Anderle, May 11 '23 at 19:47

Cortex M0+ ARM Assembly - How to implement a loop position independent

1 Answers1