2

EDIT: I am fully aware that the function asmCopy might be not functionnal, my question is more about the behaviour of gcc regarding parameters passing in registers.

I'm working on STM32H7 using STM32CubeIDE whose builder is arm-none-eabi-gcc

The optimisation level is -Os

I see the following behaviour that I cannot explain. I took screen capture to get in parallel asm and C code.

My C code is calling 3 functions. The first and the third one have exactly the same parameters.

The second one takes no parameters. here is its code:

static void Reset_Cycle_Counter(void)
{
    volatile unsigned long *DWT_CYCCNT = (unsigned long *)0xE0001004;
    volatile unsigned long *DWT_CONTROL = (uint32_t *)0xE0001000;

    // Reset cycle counter
    *DWT_CONTROL = *DWT_CONTROL & ~0x00000001 ;
    *DWT_CYCCNT = 0;
    *DWT_CONTROL = *DWT_CONTROL | 1 ;
}

The third function is particular: I am trying to write some assembly code (that may very well be wrong right now).

static void __attribute__((noinline)) asmCopy(void *dst, void *src, uint32_t bytes)
{
    while (bytes--)
    {
        asm("ldrb r12,[r1], #1"); // src param is stored in r1, r12 can be modified without being restored after
        asm("strb r12,[r0], #1"); // dst paramis stored in r0
    }
}

Before the first function call (to memcpy), r0, r1 and r2 are loaded with the right values.

enter image description here

Then before call to the third function, as you can see below the parameters in r1 and r2 are wrong (qspi_addr should be 0x90000000). enter image description here

My understanding of AAPCS (procedure call standard on ARM) is that before calling a subroutine, the registers r0 to r3 should be loaded with the parameters of the functions (if any). And the subroutine does not need to preserve or restore these registers. It is then normal that the second function modifies r1 and r2. So I would expect the compiler to update r0, r1 and r2 before the third call.

If I change the optimisation code to -O0, I indeed get this expected behaviour.

What do you think ?

Guillaume Petitjean
  • 2,408
  • 1
  • 21
  • 47
  • The issue without annotation is that the compiler sees nothing using R0 and R1 and may decide to make a copy of R2 (`bytes`). For instance it might decide to unroll the loop. Especially this fits your description of the no optimization case working. Annotated inline assembler helps to resolve this. If the compiler inlines all functions, it may decide `src` and `dst` are useless. – artless noise Jun 27 '19 at 13:13
  • Well the same issue occurs with correct annotated inline assembler function (see below) . The function is not inlined and the loop is not unrolled. – Guillaume Petitjean Jun 27 '19 at 13:16
  • 1
    From the other 'answer', "My understanding is that the compiler "modifies" the prototype of the function to build a function taking only one parameter." The compiler modifying something is evidence that you have not correctly annotated. The function below may not be correct. Also from [ARM link and frame pointer](https://stackoverflow.com/questions/15752188/arm-link-register-and-frame-pointer) a static function doesn't have to adhere to any ABI. You need to make this function global if you what the compiler to adhere to the ABI. This is not the best solution; neither is volatile. – artless noise Jun 27 '19 at 13:46

3 Answers3

3

You can't just open an inline assembly block and assume that r0 and r1 still contain the function arguments. There is no guarantee for that whatsoever. If you need to use the arguments you need to pass them properly as input and or output operands

static void __attribute__((noinline))
myAsmCopy(void* dst, void* src, uint32_t bytes) {
  asm volatile("1: cbz %[bytes], 1f \n"
               "ldrb r12, [%[src]], #1 \n"
               "strb r12, [%[dst]], #1 \n"
               "subs %[bytes], #1 \n"
               "b 1b \n"
               "1: \n"
               : [dst] "+&r"(dst), [src] "+&r"(src), [bytes] "+&r"(bytes)
               :
               : "cc", "memory", "r12");
}

GCC has some extensive documentation about inline assembly here: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

As you've obviously never used any of that before I must heavily advice against it. If "C contains footguns" then inline assembly is putting a 6-shot revolver with 5 bullets to your head.

Vinci
  • 1,382
  • 10
  • 12
  • The point is exactly to play with inline assembly, I of course don't intend to use it in real application... And my question is not about the asm function I try to code but about the AAPCS. – Guillaume Petitjean Jun 26 '19 at 08:50
  • "You can't just open an inline assembly block and assume that r0 and r1 still contain the function arguments. There is no guarantee for that whatsoever" Fully agree. But in my case, r1 doesn't contain the right value just before the branch instruction (function call) – Guillaume Petitjean Jun 26 '19 at 08:52
  • 1
    Which doesn't matter from the compilers point of view because you don't do anything with them. – Vinci Jun 26 '19 at 08:55
  • I think I don't understand what you mean. I've been debugging ARM assembly for years and I always saw function parameters passed in r0, r1, r2, r3. This is what the procedure call standard from ARM says: "The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls)." So why do you says "you don't do anything with them" ? – Guillaume Petitjean Jun 26 '19 at 09:05
  • All the compiler sees is a "meaningless" loop with some instructions in it. Your asm block says "I don't care what r0 and r1 are" so why should gcc? – Vinci Jun 26 '19 at 09:39
  • OK I see what you mean. I thought the compiler would "respect" the procedure call standard (I mean putting params in r0..r2) even if it looks useless at the end. This explains why it is working fine in -O0 – Guillaume Petitjean Jun 26 '19 at 09:45
  • 1
    `subs` has set the condition codes so the `b 1b` can be `bne 1b` with the local label changed to the `ldrb` instruction saving one ins in loop count. – artless noise Jun 27 '19 at 13:51
0

If you try to ask the compiler how to archive it everything is getting much easier

https://godbolt.org/z/rXxeRe

void __attribute__((noinline)) asmCopy(void *dst, void *src, uint32_t bytes)
{
    while (bytes--)
    {
        asm("ldrb r12,[r1], #1"); // src param is stored in r1, r12 can be modified without being restored after
        asm("strb r12,[r0], #1"); // dst paramis stored in r0
    }
}

void __attribute__((noinline)) asmCopy1(void *dst, void *src, uint32_t bytes)
{
    while (bytes--)
    {
        *(uint8_t *)dst++ = *(uint8_t *)src++;
    }
}

and the code

asmCopy:
.L2:
        adds    r2, r2, #-1
        bcs     .L3
        bx      lr
.L3:
        ldrb r12,[r1], #1
        strb r12,[r0], #1
        b       .L2
asmCopy1:
        subs    r0, r0, #1
        add     r2, r2, r1
.L5:
        cmp     r1, r2
        bne     .L6
        bx      lr
.L6:
        ldrb    r3, [r1], #1    @ zero_extendqisi2
        strb    r3, [r0, #1]!
        b       .L5
0___________
  • 60,014
  • 4
  • 34
  • 74
  • I didn't understand the sentence "If you try to ask the compiler how to archive it". Also, what is wrong in the asm of the first function in your compiler explorer ? – Guillaume Petitjean Jun 26 '19 at 08:59
  • Also, how the second function asmCopy1 can work if r1 does not contain the value of *src when you enter it ? – Guillaume Petitjean Jun 26 '19 at 09:18
  • it will as it is the ABI. – 0___________ Jun 26 '19 at 09:28
  • Well apparently it's more complex that that. I had a look to the disassembly of @Vinci code, at the very beginning of myAsmCopy (just after BL) r1 and r2 do not contain the function parameters HOWEVER the first instructions of the funtion are MOV and LDR to load registers r1 and r2 with hard coded values (my code is just an example, parameters are fixed). – Guillaume Petitjean Jun 26 '19 at 09:55
  • it is as simple – 0___________ Jun 26 '19 at 11:22
0

I think I've found the answer.

In the function I am testing (whether it is the crappy one I've implemented, or the better one from @Vinci) some parameters passed to the function are global variables (arrays of dummy data to run some tests).

My understanding is that the compiler "modifies" the prototype of the function to build a function taking only one parameter. The other parameters are considered as constants and just PC relatively loaded at the beginning of the function.

So I modified the code to call the very same function but with local volatile pointers and the issue disappears: I can see registers r0,r1 and r2 loaded with the parameters as I expected.

Does it make sense ?

Guillaume Petitjean
  • 2,408
  • 1
  • 21
  • 47
  • Does `Reset_Cycle_Counter` clobber r0-r2? Something is wrong with the images in your post (it is much better to give us text to help you). ARM loads all constant via the PC (unless they have many zero bits). It is difficult for anyone to help as you haven't given a complete example that shows the behaviour. What you see is not normal, so you have some issue somewhere which we can only guess at. It is 100% correct that the assembler was not right. Maybe it is not your only issue. – artless noise Jun 27 '19 at 15:38
  • What else would you need ? I've given the source code of the functions and you can see (despite the poor quality of the images) how they are called in the main(). You keep saying that the function was not correct but i've already written several times that this is not my question and that the issue I am referring too occurs also with the correct inline assembly function. – Guillaume Petitjean Jun 28 '19 at 06:03
  • Reset_Cycle_Counter modifies r0-r2 yes. – Guillaume Petitjean Jun 28 '19 at 06:08
  • Look at [how to ask](https://stackoverflow.com/help/how-to-ask) and [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). Your images are not very helpful. If someone can compile things and see what you are seeing, it is much easier to resolve. Otherwise people guess. Like why I have to ask about `Reset_Cycle_Counter`. If it is trashing r0-r2, then you main calling code is wrong because r0,r1 have the pointers set before the `Reset_Cycle_Counter` and they will be trash when called. Why this is I can not understand without a minimal working example. – artless noise Jun 28 '19 at 13:56
  • How the calling code can be wrong regarding registers ? (I mean it's just basic C code calling 3 functions) Also why is it surprising that a function messes up r0-r2 ? it is quite normal according to the ABI, but of course usually before the following function call the calling code is supposed to fill r0-r2 with parameters if needed – Guillaume Petitjean Jun 28 '19 at 14:59
  • At least in your original image above, things are wrong. It makes sense that the compiler decided your arguments and unused with the original code. Maybe when you replaced with the annotated assembler things changed. I can not disassemble that because I don't have a complete example. It is not normal (or rather what you want) that myAsmCopy will not get r0,r1,r2 set before calling. The original image does not have this. – artless noise Jun 28 '19 at 17:42
  • So I tried to reproduce a simple complete example on Compiler Explorer https://godbolt.org/z/ggH0Q6. But I couldn't. As you see registers r0-r2 are filled as expected before calling asmCopy. So there is something different in my project, either the rest of the code orthe build system setting that provoke the behaviour. I will dig around this. – Guillaume Petitjean Jul 01 '19 at 08:43