2

Recent versions of GCC, including version 12, implement an inter-procedural analysis that fatally damages the following (GCC-only) code for a system call stub on ARM/Thumb.

typedef struct { int sender; int arg; } message;

#define syscall(op)  asm volatile ("svc %0" :: "i"(op))

#define SYS_SEND 9

#define NOINLINE __attribute((noinline))

void NOINLINE send(int dest, int type, message *msg)
{
    syscall(SYS_SEND);
}

void send_int(int d, int t, int v)
{
    message msg;
    msg.arg = v;
    send(d, t, &msg);
}

The intention is that the operating system's trap handler will find the three arguments of send by accessing the saved values of the argument registers r0--r2 in the exception frame for the trap. The issue is apparently that the optimiser thinks, on looking at the body of send, that the fields of its message argument are not used. Consequently, the assignment to msg.arg in the body of send_int is deleted. This is revealed by compiling the above source with

arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -O -g -Wall -ffreestanding -c trial.c 

and dumping the resulting object code.

What annotation is appropriate in order to disable this over-enthusiastic optimisation?

  • Giving send the additional attribute noipa works, but I am nervous about this because it seems especially non-standard, and GCC documentation describes it as provided mostly for compiler debugging. The existing attribute noinline should remain for the sake of earlier GCC versions that don't undertand noipa. Unimplemented attributes are just ignored by GCC, aren't they?

  • It also works to label the msg buffer in send_int as volatile (and suppress a warning by adding a cast in the call to send). But this seems especially obscure.

  • Adding "memory" as a clobber on the svc instruction doesn't work.

It would also fix the problem if I wrote the system call stubs like send in a separate file of supporting assembly language routines, and that might be the best and most robust way forward.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
Mike Spivey
  • 660
  • 4
  • 11
  • Note that the `volatile` in `asm volatile` here is implied by the lack of output annotations, so could be removed. – Mike Spivey Aug 07 '23 at 06:44
  • One obvious solution is to place `send` in its own file, so that it's compiled in isolation. – Tom Karzes Aug 07 '23 at 07:22
  • It shouldn't optimize inline assembly or it is broken. A dirty work-around in the meantime might be to add something like `volatile int dummy = dest;` at the top of the function. – Lundin Aug 07 '23 at 09:19
  • 2
    Does this help? https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html – Ian Abbott Aug 07 '23 at 15:29
  • 4
    @Lundin; The whole point of GCC's elaborate inline assembly mechanisms is that the compiler *can* optimize around it, within the bounds specified by the asm statement's declared operands. Here the problem is that the asm implicitly has operands which are undeclared. – Nate Eldredge Aug 08 '23 at 16:38
  • *What is appropriate in order to disable this over-enthusiastic optimisation?* Nothing, you need to improve the under-specified assembler. – artless noise Aug 10 '23 at 13:34

2 Answers2

7

I think the issue is deeper. The basic assumption in this code is that upon executing the asm statement in send(), the contents of the relevant registers are still as they were on entry to the function. GCC doesn't promise that and never did. The compiler always has the freedom to modify registers arbitrarily anywhere except within an asm statement, and that includes, conceptually, the "empty code" between the opening brace and the asm statement.

Of course, when not doing IPA, the compiler didn't happen to take advantage of that freedom because there was no good reason to, but it could have. If at anytime in the past years, the compiler had arbitrarily inserted mov r0, #42 before your svc instruction, this behavior would be silly and inefficient, but not in any way incorrect or contrary to documentation.

If you need for the compiler to make sure that certain values are in certain registers when the asm is executed, then you have to tell it so, explicitly, by specifying appropriate input operands. This is the fundamental premise of GCC's inline assembly. Since your asm("svc %0") has no register input operands, you are effectively telling the compiler that you don't care what is in any of the registers, and it's taking you at your word.

Since in fact you do care that the values of dest, type and message are in registers, you need an input operand for each one. Since moreover you want them to be in specific registers, you can achieve that with register asm("...") declarations. Note carefully that, despite what it may appear, this does not tell GCC to reserve a particular register for the variable in general; the value will be in the right register when your asm is executed, but not necessarily at any other time.

Other notes:

  • You need the memory clobber in addition to all this. Without it, the compiler assumes that the asm does not read or write any object in memory. So it would still ensure that you get the msg buffer pointer in a register, but the preceding stores that populate the buffer could be optimized out as dead.

    (There are more delicate ways to inform the compiler that the asm reads from the msg buffer but not from anywhere else in memory, see How can I indicate that the memory *pointed* to by an inline ASM argument may be used?, but they are particularly subtle and hard to get right.)

  • If your system call modifies any registers, including those used for the inputs as well as the condition code flags in CPSR, then you need to include clobbers or output operands for each one. If it modifies any input registers, e.g. leaving a return value in r0, then you would list dest as a read-write operand ("+r" constraint) and retrieve the return value from dest after the asm if you care about it.

There is a carefully constructed example of an ARM system call in GCC inline asm at ARM inline asm: exit system call with value read from memory (the noreturn stuff will not apply to you, of course; and unlike that example, you will have to pay attention to outputs and clobbers).

Once this is done, all the noinline / noipa stuff will be unnecessary, because the compiler will be guaranteed to populate the registers correctly even when the function is inlined or IPAed, and you will get the benefit of avoiding the function call overhead.


Now, if you don't want all that hassle, then yes, you can write the send function in a pure asm .s file. Then it is opaque to the compiler, which will just have to follow the usual calling conventions when calling it, and that populates the registers as you want. You also have the freedom to read and write memory, modify any call-clobbered registers, and so on.

If you want to keep it in the .c file, you could use the naked attribute, effectively telling the compiler to simply emit a label and then pass through the contents of an asm statement. This will also have the effect of preventing inlining and IPA because the entire function becomes opaque to the compiler.

Strictly speaking, naked is only supported for basic asm (with no input or output operands), so technically your i operand for the system call number would not be allowed. My guess is that it would work without a problem, however.


Just as a general comment: I tend to find that "How do I disable this optimization that breaks my code?" is almost always an XY question. Usually the issue is really that the programmer has not given the compiler proper information about the intended semantics of the code, which it needs in order to correctly determine which optimizations are legitimate. The exception is cases where the optimization in question is actually a compiler bug, and disabling it serves as a workaround, but this is rare.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • The problem may be deeper, but it is also different from what you have understood: it isn't that the three arguments to `send` are not put in the right registers or kept there (though I take your point that I could laboriously say that the three arguments should be put in variables that live in the specified registers where they already are). The surprise is that IPA deletes the assignment `msg.arg = v` in the body of `send_int`. Perhaps `naked` is what I want, though with it the `i` operand is documented not to be reliable. – Mike Spivey Aug 09 '23 at 19:05
  • But when all said and done, the simple and manly thing to do is write the stubs in assembly language. With a suitable macro definition. they come out at one line each. – Mike Spivey Aug 09 '23 at 19:06
  • @MikeSpivey: The deletion of `msg.arg = v` is an example of what I mentioned involving the `memory` clobber. Without it, the compiler assumes that your `asm` does not read any part of memory, including in particular the contents of `msg`. Since `msg` is local, there is nothing else that can read it either. Therefore, it concludes that `msg.arg` is never read again after being written: the write is a dead store, and can be removed. – Nate Eldredge Aug 10 '23 at 23:40
  • @MikeSpivey: I agree that writing stubs in assembly language does have many benefits, though I must say that I disagree with your choice of adjective. – Nate Eldredge Aug 10 '23 at 23:41
  • @MikeSpivey: *The surprise is that IPA deletes the assignment `msg.arg = v` in the body of `send_int`* - That's a symptom of forgetting to use a `"memory"` clobber or otherwise deal with [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) . You haven't told GCC that the object is an input to the asm, so it deletes dead stores within the non-inline function. It's not IPA. The issue Nate points out first, about assuming that GCC won't step on other registers, *is* also a bug, just one that happens not to bite your code. – Peter Cordes Aug 11 '23 at 00:11
  • With assembly, especially inline asm, correct results on one system with one compiler don't guarantee correctness (working on other systems because the relevant standards like the calling convention and GCC manual promise it will). It's not rare that code will "happen to work", which is why testing isn't sufficient to find all bugs. (This is true in C in general, since it's a language where undefined behaviour is a thing.) – Peter Cordes Aug 11 '23 at 00:13
  • Given all that, and the fact that IPA can now make the questionable body of one function affect the correctness of others (and it *is* IPA that is doing that), I realise that the `naked` attribute is what I wanted all along. I will go away now and add it, or rather just move these stubs into the assembly language support, with the only the mild administrative problem of matching up the symbolic names and the values of system call numbers in both C and assembler. For my purposes, it hardly matters whether the stubs can be inlined. – Mike Spivey Aug 11 '23 at 01:49
4

To add to Nate Eldredge's overview, here are details of what happens under various conditions. All the examples were compiled with GCC 12 at the -O oprimisation level -- my students and I prefer it because it keeps the compiler output close to the source, aiding understanding -- but the results with -O2 are identical.

All the versions of send compile to the same ARM code:

00000000 <send>:
   0:   df09        svc 9
   2:   4770        bx  lr

What differs is the code compiled for send_int. Correct code for it is obtained by giving send the attribute noipa (version 1):

00000016 <send_int1>:
  16:   b500        push    {lr}      @ Save return address
  18:   b083        sub sp, #12       @ Allocate stack space for msg
  1a:   9201        str r2, [sp, #4]  @ Fill in msg.arg
  1c:   466a        mov r2, sp        @ Put &msg in r2; d and t remain in r0 and r1
  1e:   f7ff fffe   bl  12 <send1>    @ Call send
  22:   b003        add sp, #12       @ Deallocate stack space
  24:   bd00        pop {pc}          @ Return

Using the attribute noinline in place of noipa results in the str instruction being deleted, causing the bug.

A similar, correct, result is obtained by using the attribute naked on send, but then it is necessary to write the return instruction explicitly (version 2):

void __attribute__((naked)) send2(int dest, int type, message *msg)
{
    syscall(SYS_SEND);
    asm volatile ("bx lr");
}

Apart from the fact that the mechanism for substituting the call number works but isn't guaranteed to work in a naked function, this seems the simplest option. (It's possible to avoid that mechanism at the expense of some added obscurity by exploiting the C pre-processor to turn the call number into a string and concatenate it with the rest of the instruction.) The same two instructions can, of course, be written in a separate assembly language source file, with the same effect.

Following Nate's suggestion, let's try writing send in a way that explicitly puts the arguments in the registers where they belong (version 3).

void send3(int dest, int type, message *msg)
{
    register int dd asm("r0") = dest;
    register int tt asm("r1") = type;
    register message *mm asm("r2") = msg;
    asm ("svc %0" :: "i"(SYS_SEND), "r"(dd), "r"(tt), "r"(mm) : "memory");
}

The code compiled for send is the same as before, because the three arguments arrive in the right registers, and the initialisation of the three local variables can be achieved with no code at all. With this definition, however, GCC is able to inline the call to send into send_int and we get the following code.

0000003c <send_int3>:
  3c:   b082        sub sp, #8
  3e:   9201        str r2, [sp, #4]
  40:   466a        mov r2, sp
  42:   df09        svc 9
  44:   b002        add sp, #8
  46:   4770        bx  lr

The first two arguments of send_int arrive in r0 and r1, and can remain there while the message buffer is allocated and initialised and its address is put in r2. Only eight bytes of stack space need be allocated here, because padding is not needed for stack alignment.

Without the "memory" qualifier, GCC does not know that the system call accesses the message buffer pointed at by r2, so it omits the str instruction once again (version 4).

Mike Spivey
  • 660
  • 4
  • 11
  • Note that GNU C Extended asm (with a template an constraints) is not officially supported inside `naked` functions -- https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html#index-naked-function-attribute_002c-ARM . But I doubt you'd have a problem when the only constraint is an immediate, not anything it could possibly need to emit its own instructions to evaluate into a register (`"r"(var)`) or memory. (Also, I guess this is a teaching OS or something, since modern Linux uses `svc 0` with a call number in a register where the OS can get at it more efficiently.) – Peter Cordes Aug 11 '23 at 00:19
  • Does your OS's `send` system call not have a return value? Your `asm` statement promises the compiler that all registers are unchanged by the `asm` statement. (including CPSR holding the condition codes, since there's no `"cc"` clobber). Normally syscall wrappers use a `"+r"` constraint for the operand that shares a register with the return value, or a separate `"=r"` output to a separate C variable. (On x86 that can be done with `"=a"` output and `"a"` input; on ARM IDK if you can have two `register asm("")` vars with the same register, or if you'd need a matching constraint like `"0"(in)` – Peter Cordes Aug 11 '23 at 00:25
  • e.g. for `write` system calls: [How to specify an individual register as constraint in ARM GCC inline assembly?](https://stackoverflow.com/q/3929442) for AArch64 Linux or [How to invoke a system call via syscall or sysenter in inline assembly?](https://stackoverflow.com/q/9506353) for i386 and x86-64, and [ARM inline asm: exit system call with value read from memory](https://stackoverflow.com/q/37358451) for an ARM `_exit` system call. – Peter Cordes Aug 11 '23 at 01:17
  • (If that `svc` truly returns with all of this program's registers still holding the same value, then yes, the `asm volatile` *and* `"memory"` clobber are all you need in terms of telling the compiler about stuff other than the explicit input operands.) – Peter Cordes Aug 11 '23 at 01:19
  • 1
    Peter: I'd wondered about Linux putting the call number in a register. On a microcontroller. there really isn't much difference in speed -- so perhaps it's the cost of accessing code in user space with an MMU present that makes the difference. It's right that `send` returns no result, and that's why the return type of the function is `void`. That and `receive` are message passing primitives that atomically copy a small message (16 bytes) from one process to another. It all works out nicely as a way of programming microcontrollers in an accessible and modular way. – Mike Spivey Aug 11 '23 at 02:13
  • ... and in a world where the performance benchmark is set by Python, everything goes pleasingly fast, even with the overhead of message passing! – Mike Spivey Aug 11 '23 at 02:15
  • Oh, and note that (on ARM MCUs anyway), the system call can't expect to find register values still in the registers, but must retrieve them from the exception frame in user (data) memory. That's because in very rare cases an interrupt will have been serviced between saving the state and invoking the system call handler, trashing the registers. – Mike Spivey Aug 11 '23 at 04:00