2

If I call an ARM assembly function from C, sometimes I need to pass in many arguments. If they do not fit in registers r0, r1, r2, r3 it is generally expected that 5-th, 6-th ... x-th arguments are pushed onto stack so that ARM assembly can read them from it.

So in the ARM function I receive some arguments that are on the stack. After finishing the assembly function I can either remove these arguments from stack or leave them there and expect that the C program will deal with them later.

If we are talking about GCC C and ARM assembly who is usually responsible for cleaning up the stack?

  • The function that made the call (A)
  • Or the function that was called (B)

I understand that when developing we could agree on either convention. But what is generally used as the default in this particular case (ARM assembly and GCC C)?

And how would generally a low level piece of code describe which behavior it implements? It seems that there should be some kind of standard description for this. If there isn't one it seems that you pretty much just have to try them both and look at which one does not crash.

If someone is interested in how the code could look like:

arm_function:    
    stmfd sp, {r4-r12, lr}     # Save registers that are not the first three registers, SP->PASSED ARGUMENTS
    ldmfd sp, {r4-r6}          # Load 3 arguments that were passed through the stack, SP->PASSED ARGUMENTS 
    sub sp, sp, #40            # Adjust the stack pointer so it points to saved registers, STACK POINTER->SAVED REGISTERS->PASSED ARGUMENTS

    #The main function body.

    ldmfd sp!, {r4-r12, lr},  # Load saved registers STACK POINTER->PASSED ARGUMENTS
    add sp, sp, #12           # Increment stack pointer to remove passed arguments, SP->NOTHING

    # If the last code line would not be there, the caller would need to remove the arguments from stack.

UPDATE: It seems that for C/C++ choice A. is pretty standard. Compilers usually use calling conventions like cdecl that work pretty similar to code in the answers below. More information can be found in this link about calling conventions. Changing C/C++ calling convention for a function does not seem to be so common/easy. With older C standard I could not manage to change it, so it looks like using A should be a decent default choice.

chris544
  • 889
  • 2
  • 10
  • 21
  • 1
    Low-level code doesn't describe itself. However the object files and binaries might have headers describing the whole thing (which doesn't mean all functions inside follow that convention...). There are tons of ABIs for ARM, but as far as I know, all use scenario **A**, that is caller-cleanup. – Jester May 08 '15 at 23:42
  • @jester: I would be surprised if not, as anything else would work against CPU mechanisms. Main differences are reserved registers and passing of larger than word arguments/results, compound types, etc. – too honest for this site May 09 '15 at 01:10
  • 1
    Just notet that there are more problems than just the cleanup. Marshalling is also critical, as there are many variations. E.g.: if you push 3 word, then 1 long long: is that split between r3 and the stack or all pushed onto the stack, omitting r3? Both versions were used for ARM. That's why you need the PCS. Also, the order is important (first argument pushed first, last last or reversed? This is part of the ABI for each language (Pascal again used the obbosite of C and _that_ is much more likely to be used than pdecl cleanup. – too honest for this site May 09 '15 at 01:41
  • You may want to look at [ARM Link and frame pointer](http://stackoverflow.com/questions/15752188/arm-link-register-and-frame-pointer). The AAPCS has several options (static base, etc) and they only apply to functions with **external linkage**. – artless noise Aug 28 '15 at 19:45

2 Answers2

4

The current ARM procedure call standard is AAPCS.

The language-specific ABI can be found here. Relevant will be the document about C, but others should be similar (why reinvent the wheel?).

A good start for reading might be page 14 in the AAPCS.

It basically requires the caller to clean up the stack, as this is the most simple way: push additional arguments onto the stack, call the function and after return simply adjust the stack pointer by adding an offset (the number of bytes pushed on the stack; this is always a multiple of 4 (the "natural 32bit ARM word size).

But if you use gcc, you can just avoid handling the stack yourself by using inline assembler. This provides features to pass C variables (etc.) to the assembler code. This will also automatically load a parameter into a register if required. Just have a look at the gcc documentation. It is a bit hard to figure out in detail, but I prefer this to having raw assember stubs somewhere.

Ok, i added this as there might be problems understanding the principle:

caller:
    ...
    push  r5    // argument which does not fit into r0..r3 anymore
    bl    callee
    add   sp,4  // adjust SP

callee:
    push r5-r7,lr  // temp, variables, return address
    sub  sp,8   // local variables
    // processing
    add   sp, 8     // restore previous stack frame
    pop   r5-r7,pc  // restore temp. variables and return (replaces bx)

You can verify this by just disassmbling some sample C functions. Note that the pre- and postamble may vary if no temp registers are used or the function does not call another function (no need to stack lr for this).

Also, the caller might have to stack r0..r3 before the call. But that is a matter of compiler optimizations.

Disassembly can be done with gdb and objdump for example. I use -mabi=aapcs for gcc invocation; not sure if gcc would otherwise use a different standard. Note that all object files have to use the same standard.

Edit: Just had a peek in the AAPCS and that states that the SP need only 4 byte alignment. I might have confused this with the Cortex-M interrupt handling system which (for whatever reason, possibly for M7 which has 64 bit busses) aligns the SP to 8 bytes by default (software-config option). However, SP must be 8 byte aligned at a public interface. Ok, the standard actually is more complicated than I remembered. That's why I prefer gcc caring about this stuff.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • Thanks for some official documentation. For now let's consider that we want to do it manually. From this and previous comment I understand that ARM functions should generally be written so that they leave the cleaning up to the caller. But what does GCC C expect by default? Is it also caller cleanup? Basically if I compile everything with default settings, will caller cleanup generally not crash, but calle clean up crash? – chris544 May 09 '15 at 00:06
  • @chris544: That's why I posted the link. You really should read that, as it explains in detail how/what and where (and whatelse) happens. It is not that complicated to read, but helps avoid actual pitfalls. – too honest for this site May 09 '15 at 00:09
  • Why do you add sp,4 not add sp, 8? It seems that the argument remains on the stack after adding only 4 to the stack pointer. – chris544 May 09 '15 at 00:37
  • @chris544: I like it when ppl really read what I write. Answer is: you are right; i initially forgot the alignment, so I did not adjust the add after inserting it. Thanks! – too honest for this site May 09 '15 at 00:41
  • Yeah, it look OK. The only doubts that I still is about how this applies to the real world. I assume in the end you should still look at the default calling conventions for specific compilers. It seems the reason why this holds true is that C++/C compilers usually use __cdecl convention by default and that is caller cleanup. So by default this is correct. I am not sure if that is cpecific to C/C++ languages. Maybe it is somewhere in some specification or standard. – chris544 May 09 '15 at 00:55
  • 1
    I thought I had stated this clear. At least for ARM the AAPCS is bascially independent of the actual language. Its purpose actually_is_ to have one standard for all object files (I suppose that's the reason the the 8 byte SP alignment rule). _pdecl conventions actually would defy CPU internal mechanisms, as it would require to first pop the rterun address to a temp register, then adjust SPm then load the PC. Quite some effort. Cleanup is not that strict for "real-world" code. The compiler can very well defer it to the end of the caller function or where it has to adjust the SP anyway. – too honest for this site May 09 '15 at 01:03
  • Regarding "real-world": Just try it! All I can tell is that this is exaclty how gcc does it for my code. And, yes, for ARM it is in a standard: "ARM Architecture Procedure Call Standard" - short: AAPCS. That's why I cited it. For x86, there are different standards: Linux-ABI, Windows-ABI and supposedly Intel also has a standard nowadays (and I would think Linux actually might use this). For ARM, there were other standards (just have a look at gcc-doc), but you should not use them anymore unless intrfacing with legacy object files. Maybe you want to clarify this in your question. – too honest for this site May 09 '15 at 01:05
  • The last instruction in example should be `pop r5-r7,pc` (not `lr`) to return. **SP must be 8 byte aligned** for the reason that a few memory loading/storing instructions require that pointer is 8B aligned. If 'callee' uses one of these instructions with 4B aligned SP, it will generate unaligned access exception. – user3124812 Apr 03 '17 at 01:44
  • @user3124812: You are correct about the `pop`. The alignment however was exactly required as written at the time of writing (read the last paragraph). I'd appreciate if you can point out which Thumb2 instructions require 64 bit alignment and where AAPCS states different. The M3/4 at least have a bit to auto-align to 64 bits or live with 32 bit alignment. It does not, however, generate an exception. Unaligned access exceptions are generated if a 16 or 32 bit access is not "naturally" aligned. There are no 64 bit accesses in ARMv7M (LDRD/STRD are 32 bit accesses by definition). – too honest for this site Apr 03 '17 at 13:13
  • @Olaf, `LDREXD, STREXD` require doubleword alignment (ARMv7-A TRM, chapter A3.2). http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211k/Cdfigfbe.html - ''The address in memory must be 64-bit aligned". M3/M4, for my best knowledge, do not support these instructions. So for these particular architectures there would not be a problems – user3124812 Apr 04 '17 at 00:16
  • @user3124812: As you write, point is, ARMv7**M** does not support these instructions and that's what the question clearly is about. But even for ARMv7A (to stick with the current 32 bit architectures), the targets for these instructions are most unlikely on the stack, considering what they are used for. And this is not about other alignment (NEON would be a more problematic issue). Said that: you still don't prove me wrong about AAPCS. Can you provide a reference to where I read it wrong? – too honest for this site Apr 04 '17 at 05:29
3

If some spaces allocated on the stack by caller function (argument passing), stack clearance done within the caller function. And how it happens you may ask. In ARM @Olaf has completely cleared, and in x86 it is usually like this:

sub     esp, 8      ; make some room 
...                 ; move arguments on stack
call    func
add     esp, 8      ; clean the stack

or

push    eax          ; push the arguments
push    ebx          ; or pusha, then after call, popa

call func

add     esp, 8       ; assuming registers are 4 bytes each

Also how the interaction between caller and callee in a system takes places is explained in ABI (Application Binary Interface) You may find it useful.

AmirSojoodi
  • 1,080
  • 2
  • 12
  • 31
  • I did not downvote, but in this case the question is a little bit more specific. It is about stack argument cleanup specifically. Generally the called function really should leave the stack as it was received, but when arguments are passed, it seems to vary a lot more. – chris544 May 08 '15 at 23:53
  • Actually I meant arguments for line 2 of the code, not some data. better to edit it. – AmirSojoodi May 08 '15 at 23:55
  • I downvoted, as you actually confused caller and callee function. Your assember code dos actually the right, but it is the _caller_ to adjust the stack. Anything else would defy code optimization and the normal CPU RTS automatism (older CISC CPUs like 68000 has RTD which actually included that stack adjustment; this was for Pascal-Languages which use callee-adjust as astandard). Oh, and the question was about the argument stack frame, not that for local variables. – too honest for this site May 08 '15 at 23:56
  • @Olaf, You are right. Actually I was editing it before you downvoted me. – AmirSojoodi May 08 '15 at 23:58
  • So why is it still wrong? A downvote does not interrupt an edit. The arguments are simply pushed onto the stack. There is no need to pre-allocate a frame and then use costly `str rx,[sp, offset]`. This expecially for ARMv[67]-M. – too honest for this site May 09 '15 at 00:00
  • Is it not correct now? I know it wasn't about local variables. But I make it clear that only local variable allocation should be managed by the called function. And you are right about performance of the code, however it is not wrong. – AmirSojoodi May 09 '15 at 00:04
  • But that was not the question! Removing the local variable frame is certainly the job of the callee, but this is for the caller. Also, even if we were talking about this subject, the code is wrong, as the add is done by the caller, after the PC has been popped. – too honest for this site May 09 '15 at 00:06
  • Come on! why the code is wrong?!! :| the return instruction in the called function POPs the PC. I know for sure this is true... I don't know why you are arguing with me this bad... – AmirSojoodi May 09 '15 at 00:09
  • I updated my answer. You might compare that with what you wrote. Basically, I downvoted as the answer was missleading (confusing caller and callee) and to a different toppic (local variable stak frame). Sorry, but that justifies a downvote. But there is still time to eitner get it right or delete the answer (funny, but one gets back the rep, if that is what you are for). I would recommend to get it right, however, I'll remove the downvote then - promise. – too honest for this site May 09 '15 at 00:29
  • Ok, emoved the downvote. Still not correct completely, as you would not pop arguments from the stack after the call, but just adjust SP (even in x86). To be nit-picky: the question was about ARM;-)). (I'm an engineer. we always want it most precise - for good reasons) – too honest for this site May 09 '15 at 00:39