0

I came across this surprising tidbit while translating jonesforth.s from x86 ASM to C with GCC extensions. depends_on_optimization.c is a "simple" C program which depends on whether it's compiled with (any -O) or without (-O0) optimizations. It's entirely possible that the program invokes Undefined Behavior as it's using _start(void) instead of int main(int argc, char *argv[]) then poking around the C stack to find argc. That's an artefact of translating ASM to C.

#include <stdlib.h>
#include <unistd.h>

#define NEXT goto **ip++
#ifdef __OPTIMIZE__
#define OFFSET 3
#else
#define OFFSET 0
#endif

void _start()
{
    intptr_t *stack[1024];
    register intptr_t **sp = stack + (sizeof stack / sizeof *stack), *p;
    register void **ip;

    goto _start;

ADD:
    p = *sp++;
    sp[0] += (intptr_t)p;
    NEXT;
LIT:
    *--sp = (intptr_t *)*ip++;
    NEXT;
S0:
    *--sp = (intptr_t *)(stack + (sizeof stack / sizeof *stack));
    NEXT;
EXIT:
    p = *sp++;
    exit(*p);

    static void *code[] = {&&S0, &&LIT, (void *)OFFSET, &&ADD, &&EXIT};
_start:
    ip = code;
    NEXT;
}

What compiler flag should I use to make GCC produce the same stack layout regardless of optimization level? I tried various combinations of -fno-stack-check, -fno-stack-protector, and -fno-stack-clash-protection to no avail.

Jan Burgy
  • 21
  • 1
  • 4
  • Please include your C code as text in the question, not a link to some external website which will be stale in the future. – pmacfarlane Jul 30 '23 at 09:53
  • 1
    That's a pretty nasty piece of code. Personally, I would throw it in the bin and start again by hand. For example, your stack is an array of (effectively) `int**` Why? It's the data stack, It should be an array of `int` - or ideally a fixed sized int like `int32_t`. – JeremyP Jul 30 '23 at 10:09
  • 1
    *then poking around the C stack to find argc* - It's making assumptions about the stack layout and how far past the end of `intptr_t *stack[1024];` it will find `argc`. (Which was pointed at by the stack pointer on entry to `_start`. If you're on x86 (-64), that's where a return address arrives for functions, so you could just use `__builtin_return_address` to read the value. – Peter Cordes Jul 30 '23 at 10:12
  • [How Get arguments value using inline assembly in C without Glibc?](https://stackoverflow.com/a/50283880) shows how to do that for the x86-64 SysV ABI, with some nasty hacks but will get GCC to make the asm we want without any inline `asm`, for any optimization level, without making assumptions about stack layout other than the ABI. (Not that it's well-defined behaviour to take the address of a function arg and offset from there, though.) For a RISC machine where the return address is passed in a link register (like ARM `lr`), that wouldn't work, but your `OFFSET` isn't portable either. – Peter Cordes Jul 30 '23 at 10:12
  • 1
    Anyway, the different in stack layout should be clear from looking at the asm: part of it is from `push rbp` to make a frame pointer, since `-fno-omit-frame-pointer` is only on at `-O1` and higher. Without `-fstack-protector-strong`, GCC might put scalar locals above the array; you'd have to check. With optimization, it'll keep them in registers so no stack space at all for them, unless you use `volatile`. But that would still just be hard-coding an offset that happens to work with a specific compiler version and options, and could change in the future. – Peter Cordes Jul 30 '23 at 10:16
  • My recommendation would be to write a simple `_start` in asm that calls a `main` (as shown in other answers on the Q&A I linked), or let GCC link the CRT startup code so you can write a normal `main` that takes `argc`. Nothing you can do with dead-reckoning and hard-coding offsets based on some GCC version's stack-frame layout choices is at all suitable for anything except local experiments to see if you read the asm correctly. (See [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) re: looking at asm.) – Peter Cordes Jul 30 '23 at 10:19
  • Re: what unoptimized code-gen looks like: [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) - every variable has its own address, and this isn't optimized away when you tell the compiler not to optimize. – Peter Cordes Jul 30 '23 at 10:21
  • 2
    It is one big Undefined Behaviour. Analyzing it makes little or no sense. if the output depends on optimizations - it is a sight of UB. Bin this garbage and write a proper one yourself – 0___________ Jul 30 '23 at 12:42
  • 1
    The only good reason I can think of to even consider translating assembly to C is to arrive at a more portable version of the original. In that case, however, the best way forward is to *rewrite* in (standard) C. Assembly code of any complexity generally does not have a direct translation into C. Indeed, that's closely related to why C compilers have options for optimization. – John Bollinger Jul 30 '23 at 13:05
  • You're spot on @JohnBollinger : see https://bur.gy/2023/02/24/what-forth-again.html for details of how translating jonesforth from assembly to C allowed me to compile it to WASM. And to the folks who suggest a different approach: thanks for the suggestion but threaded code using GCC labels as values is the whole point. – Jan Burgy Jul 30 '23 at 21:41

0 Answers0