1

I have a short snippet of code, with some inline assembly that prints argv[0] properly in O0, but does not print anything in O2 (when using Clang. GCC, on the other hand, prints the string stored in envp[0] when printing argv[0]). This problem is also restricted to only argv (the other two function parameters can be used as expected with or without optimizations enabled). I tested this with both GCC and Clang, and both compilers have this issue.

Here is the code:

void exit(unsigned long long status) {
    asm volatile("movq $60, %%rax;" //system call 60 is exit
        "movq %0, %%rdi;" //return code 0
        "syscall"
        : //no outputs
        :"r"(status)
        :"rax", "rdi");
}

int open(const char *pathname, unsigned long long flags) {
    asm volatile("movq $2, %%rax;" //system call 2 is open
        "movq %0, %%rdi;"
        "movq %1, %%rsi;"
        "syscall"
        : //no outputs
        :"r"(pathname), "r"(flags)
        :"rax", "rdi", "rsi");
        return 1;
}

int write(unsigned long long fd, const void *buf, size_t count) {
    asm volatile("movq $1, %%rax;" //system call 1 is write
        "movq %0, %%rdi;"
        "movq %1, %%rsi;"
        "movq %2, %%rdx;"
        "syscall"
        : //no outputs
        :"r"(fd), "r"(buf), "r"(count)
        :"rax", "rdi", "rsi", "rdx");
        return 1;
}

static void entry(unsigned long long argc, char** argv, char** envp);

/*https://www.systutorials.com/x86-64-calling-convention-by-gcc/: "The calling convention of the System V AMD64 ABI is followed on GNU/Linux. The registers RDI, RSI, RDX, RCX, R8, and R9 are used for integer and memory address arguments
and XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for floating point arguments.
For system calls, R10 is used instead of RCX. Additional arguments are passed on the stack and the return value is stored in RAX."*/

//__attribute__((naked)) defines a pure-assembly function
__attribute__((naked)) void _start() {
    asm volatile("xor %%rbp,%%rbp;" //http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html: "%ebp,%ebp sets %ebp to zero. This is suggested by the ABI (Application Binary Interface specification), to mark the outermost frame."
    "pop %%rdi;" //rdi: arg1: argc -- can be popped off the stack because it is copied onto register
    "mov %%rsp, %%rsi;" //rsi: arg2: argv
    "mov %%rdi, %%rdx;"
    "shl $3, %%rdx;" //each argv pointer takes up 8 bytes (so multiply argc by 8)
    "add $8, %%rdx;" //add size of null word at end of argv-pointer array (8 bytes)
    "add %%rsp, %%rdx;" //rdx: arg3: envp
    "andq $-16, %%rsp;" //align stack to 16-bits (which is required on x86-64)
    "jmp %P0" //https://stackoverflow.com/questions/3467180/direct-c-function-call-using-gccs-inline-assembly: "After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case."
    :
    :"i"(entry)
    :"rdi", "rsp", "rsi", "rdx", "rbp", "memory");
}

//Function cannot be optimized-away, since it is passed-in as an argument to asm-block above
//Compiler Options: -fno-asynchronous-unwind-tables;-O2;-Wall;-nostdlibinc;-nobuiltininc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++
//Linker Options: -nostdlib; -nodefaultlibs
static void entry(unsigned long long argc, char** argv, char** envp) {
    int ttyfd = open("/dev/tty", O_WRONLY);

    write(ttyfd, argv[0], 9);
    write(ttyfd, "\n", 1);

    exit(0);
}

Edit: Added syscall definitions.

Edit: Adding rcx and r11 to the clobber list for the syscalls fixed the issue for clang, but gcc to have the error.

Edit: GCC actually was not having an error, but some kind of strange error in my build system (CodeLite) made it so that the program ran some kind of partially-built program, even though GCC reported errors about it not recognizing two of the compiler flags passed-in. For GCC, use these flags instead: -fomit-frame-pointer;-fno-asynchronous-unwind-tables;-O2;-Wall;-nostdinc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++. You can also use these flags for Clang, due to Clang's support for the above GCC options.

cpp plus 1
  • 67
  • 9
  • 1
    arguments are not passed by the stack probably - did you debug it – 0___________ Mar 10 '20 at 01:29
  • @cppplus1 no, only 6 integer parameters are passed in registers https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI – phuclv Mar 10 '20 at 01:44
  • @P__J__ I believe the default calling-convention for Linux states that the first 6 integer and first 8 float parameters must be passed into registers. – cpp plus 1 Mar 10 '20 at 01:46
  • @phuclv Thanks, changed the comment. Luckily, I didn't use more than 3 parameters in the code above. – cpp plus 1 Mar 10 '20 at 01:47
  • How do you even get `open`, `write` and `exit` with `-nostdlib`? Please post complete commands along with compiler versions. Most gcc versions don't even accept `rsp` clobber. Furthermore my gcc says _"warning: 'naked' attribute directive ignored"_. – Jester Mar 10 '20 at 01:47
  • @Jester Added syscall definitions. – cpp plus 1 Mar 10 '20 at 01:49
  • 3
    Also note that a system call is allowed to destroy `rcx` and `r11`, so you should add those to your clobber list. – Nate Eldredge Mar 10 '20 at 01:58
  • Your syscall asm are quite inefficient, but what's worse is that they don't list `rcx` and `r11` as clobbered. – Jester Mar 10 '20 at 01:58
  • 1
    The [gcc manual](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html) says that you're not supposed to use extended asm for a "naked" function, only basic asm, so you need to modify `_start` accordingly. I'm surprised your code is accepted by the compiler. Passing the address of `entry` using the `i` constraint should be unnecessary; just do `jmp entry`. The clobbers are unnecessary too. – Nate Eldredge Mar 10 '20 at 02:01
  • @NateEldredge Thanks. Adding rcx and r11 to the clobber list fixed the issue while using clang, but GCC still has issues oddly. – cpp plus 1 Mar 10 '20 at 02:05
  • @NateEldredge Thanks. Using simple asm instead of extended asm along with adding rcx and r11 to the syscalls' clobber lists fixed the issue. Post your comments below as an answer so that I can accept them as the answer. – cpp plus 1 Mar 10 '20 at 02:08
  • @NateEldredge After testing the code further, I found that GCC now has the error on another parameter, but Clang seems to work fine, luckily. I believe the stack is in this format on startup, since a few sources suggested it to be this way: http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html, https://github.com/runtimejs/musl-libc/blob/master/crt/x86_64/crt1.s and http://asm.sourceforge.net/articles/startup.html – cpp plus 1 Mar 10 '20 at 02:17
  • 2
    Have you tried single stepping the `_start` function and inspecting its stack with your debugger, to see if what is actually there matches your expectations? That will probably help you understand what is going on, and find out why your processing isn't right. – Nate Eldredge Mar 10 '20 at 02:18
  • Note that the best reference for the stack layout, and practically everything else you need to know for this program, would be [the ABI itself](https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI), see Section 3.4.1. – Nate Eldredge Mar 10 '20 at 02:19
  • 1
    You seem to have left out necessary `#include` files that define `size_t` and `O_WRONLY`. What exactly is your real code? (And what exact gcc and clang command line options did you build with?) – Peter Cordes Mar 10 '20 at 02:45
  • @PeterCordes I made a header file with some Posix and Linux defines and typedefs, but without the function declarations. – cpp plus 1 Mar 10 '20 at 02:46
  • Also note that fd 1 is already open when your program starts, and (unless you redirected stdout) it will already be open on `/dev/tty`. If you just want to output to stdout like `echo(1)`, you shouldn't make any `open` system calls, just `write`. – Peter Cordes Mar 10 '20 at 03:47

1 Answers1

8
  1. You can't use extended asm in a naked function, only basic asm, according to the gcc manual. You don't need to inform the compiler of clobbered registers (since it won't do anything about them anyway; in a naked function you are responsible for all register management). And passing the address of entry in an extended operand is unnecessary; just do jmp entry.

    (In my tests your code doesn't compile at all, so I assume you weren't showing us your exact code - next time please do, so as to avoid wasting people's time.)

  2. Linux x86-64 syscall system calls are allowed to clobber the rcx and r11 registers, so you need to add those to the clobber lists of your system calls.

  3. You align the stack to a 16-byte boundary before jumping to entry. However, the 16-byte alignment rule is based on the assumption that you will be calling the function with call, which would push an additional 8 bytes onto the stack. As such, the called function actually expects the stack to initially be, not a multiple of 16, but 8 more or less than a multiple of 16. So you are actually aligning the stack incorrectly, and this can be a cause of all sorts of mysterious trouble.

    So either replace your jmp with call, or else subtract a further 8 bytes from rsp (or just push some 64-bit register of your choice).

  4. Style note: unsigned long is already 64 bits on Linux x86-64, so it would be more idiomatic to use that in place of unsigned long long everywhere.

  5. General hint: learn about register constraints in extended asm. You can have the compiler load your desired registers for you, instead of writing instructions in your asm to do it yourself. So your exit function could instead look like:

    void exit(unsigned long status) {
        asm volatile("syscall"
            : //no outputs
            :"a"(60), "D" (status)
            :"rcx", "r11");
    }

This in particular saves you a few instructions, since status is already in the %rdi register on function entry. With your original code, the compiler has to move it somewhere else so that you can then load it into %rdi yourself.

  1. Your open function always returns 1, which will typically not be the fd that was actually opened. So if your program is run with standard output redirected, your program will write to the redirected stdout, instead of to the tty as it seems to want to do. Indeed, this makes the open syscall completely pointless, because you never use the file you opened.

    You should arrange for open to return the value that was actually returned by the system call, which will be left in the %rax register when syscall returns. You can use an output operand to have this stored in a temporary variable (which the compiler will likely optimize out), and return that. You'll need to use a digit constraint since it is going in the same register as an input operand. I leave this as an exercise for you. It would likewise be nice if your write function actually returned the number of bytes written.

Community
  • 1
  • 1
Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • 1
    3. or more simply, remove the `and` and keep the `jmp` if your `entry` "function" isn't a real function either and can't return. On entry to `_start`, the stack pointer is guaranteed by the ABI to be 16-byte aligned, so one `pop` sets you up for function entry to a noreturn function. Or just `mov` load instead of pop and then you can `call` normally. [How Get arguments value using inline assembly in C without Glibc?](https://stackoverflow.com/q/50260855) shows some hand-written bare bones `_start` implementations that call a C function with argc, argv. – Peter Cordes Mar 10 '20 at 02:53
  • 1
    5. `#include ` to get macros for call numbers like `SYS_exit` or the Linux `__NR_exit` macro names from `asm/unistd.h`. These headers are even safe to include in pure asm. – Peter Cordes Mar 10 '20 at 02:56
  • 6. you can use `unsigned long rax = __NR_write` and use a `"+a"(rax)` constraint to more simply express the dual input/output use of that register in a wrapper function, instead of bothering with a matching constraint to use the same register for different input and output C vars. Or just use `"a"(__NR_write)` and `"=a"(retval)` - specific-register constraints make matching constraints unnecessary because you know what register they're going to pick so you can manually hard-code the matching. – Peter Cordes Mar 10 '20 at 02:59
  • Additional code-review of the OP's code: their C wrapper types don't match the underlying syscalls. e.g. fd args are 32-bit `int`, not 64-bit `long`. This creates mismatches if you include the normal C headers that provide prototypes for these functions (https://godbolt.org/z/XuyMQK). You *should* be able to do that and still define your own inline versions, but the OP can't because they made a mess of the types. And it results in an extra `movslq` to sign-extend `int ttyfd` from `open` to the `unsigned long long fd` for `write` in the compiler-generated asm (https://godbolt.org/z/isdSYv) – Peter Cordes Mar 10 '20 at 03:04
  • (Compiling as C++ allows overloaded function names, so it's not an error to have two versions of `open` with different signatures, letting this mess compile. Since the asm uses `;` as instruction separators instead of `\n\t` every asm block becomes one long line which is a huge mess.) Anyway, I didn't think this code-review was worth a separate answer so leaving it here. – Peter Cordes Mar 10 '20 at 03:06
  • Thanks for the tips. Your answer worked for GCC as well (instead of only Clang). It was just that there was an error in CodeLite's build system when using GCC (described in more detail in question-edit above). – cpp plus 1 Mar 10 '20 at 17:44