1

I recently dabbled into low level programming, and want to make a function somesyscall that accepts (CType rax, CType rbx, CType rcx, CType rdx). struct CType looks like:

/*
    TYPES:
        0 int
        1 string
        2 bool
*/
typedef struct {
    void* val;
    int typev;
} CType;

the function is a bit messy, but in theory should work:

#include <errno.h>
#include <stdbool.h>
#include "ctypes.h"

//define functions to set registers
#define seteax(val) asm("mov %0, %%rax" :: "g" (val) : "%rax")
#define setebx(val) asm("mov %0, %%rbx" :: "g" (val) : "%rbx")
#define setecx(val) asm("mov %0, %%rcx" :: "g" (val) : "%rcx")
#define setedx(val) asm("mov %0, %%rdx" :: "g" (val) : "%rdx")
///////////////////////////////////

#define setregister(value, register)       \
switch (value.typev) {                     \
    case 0: {                              \
        register(*((double*)value.val));   \
        break;                             \
    }                                      \
    case 1: {                              \
        register(*((char**)value.val));    \
        break;                             \
    }                                      \
    case 2: {                              \
        register(*((bool*)value.val));     \
        break;                             \
    }                                      \
}

static inline long int somesyscall(CType a0, CType a1, CType a2, CType a3) {

    //set the registers
    setregister(a0, seteax);
    setregister(a1, setebx);
    setregister(a2, setecx);
    setregister(a3, setedx);
    ///////////////////

    asm("int $0x80"); //interrupt

    //fetch back the rax
    long int raxret;
    asm("mov %%rax, %0" : "=r" (raxret));

    return raxret;
}

when I run with:

#include "syscall_unix.h"

int main() {
  CType rax;
  rax.val = 39;
  rax.typev = 0;
  
  CType rbx;
  rbx.val = 0;
  rbx.typev = 0;

  CType rcx;
  rcx.val = 0;
  rcx.typev = 0;

  CType rdx;
  rdx.val = 0;
  rdx.typev = 0;

  printf("%ld", somesyscall(rax, rbx, rcx, rdx));
}

and compile (and run binary) with

clang test.c
./a.out

I get a segfault. However, everything seems to look correct. Am I doing anything wrong here?

Ank i zle
  • 2,089
  • 3
  • 14
  • 36
  • 1
    Note also that the 32 bit `int $0x80` API (which you really [shouldn't use](https://stackoverflow.com/q/46087730/417501) in 64 bit code) takes its arguments in `eax`, `ebx`, `ecx`, and `edx`, not `rax`, `rbx`, `rcx`, or `rdx`. It does not take 64 bit arguments. – fuz Sep 26 '20 at 23:00

1 Answers1

5

After macro expansion you will have something like

long int raxret;

asm("mov %0, %%rax" :: "g" (a0) : "%rax");
asm("mov %0, %%rbx" :: "g" (a1) : "%rbx");
asm("mov %0, %%rcx" :: "g" (a2) : "%rcx");
asm("mov %0, %%rdx" :: "g" (a3) : "%rdx");
asm("int $0x80");
asm("mov %%rax, %0" : "=r" (raxret));

This doesn't work because you haven't told the compiler that it's not allowed to reuse rax, rbx, rcx, and rdx for something else during the sequence of asm statements. For instance, the register allocator might decide to copy a2 from the stack to rax and then use rax as the input operand for the mov %0, %%rcx instruction -- clobbering the value you put in rax.

(asm statements with no outputs are implicitly volatile so the first 5 can't reorder relative to each other, but the final one can move anywhere. For example, be moved after later code to where the compiler finds it convenient to generate raxret in a register of its choice. RAX might no longer have the system call return value at that point - you need to tell the compiler that the output comes from the asm statement that actually produces it, without assuming any registers survive between asm statements.)

There are two different ways to tell the compiler not to do that:

  1. Put only the int instruction in an asm, and express all of the requirements for what goes in what register with constraint letters:

    asm volatile ("int $0x80" 
        : "=a" (raxret)                              // outputs
        : "a" (a0), "b" (a1), "c" (a2), "d" (a3)     // pure inputs
        : "memory", "r8", "r9", "r10", "r11"         // clobbers
         // 32-bit int 0x80 system calls in 64-bit code zero R8..R11
         // for native "syscall", clobber "rcx", "r11".
     );
    

    This is possible for this simple example but not always possible in general, because there aren't constraint letters for every single register, especially not on CPUs other than x86.

         // use the native 64-bit syscall ABI
         // remove the r8..r11 clobbers for 32-bit mode
    
  2. Put only the int instruction in an asm, and express the requirements for what goes in what register with explicit register variables:

     register long rax asm("rax") = a0;
     register long rbx asm("rbx") = a1;
     register long rcx asm("rcx") = a2;
     register long rdx asm("rdx") = r3;
    
     // Note that int $0x80 only looks at the low 32 bits of input regs
     // so `uint32_t` would be more appropriate than long
     // but really you should just use "syscall" in 64-bit code.
     asm volatile ("int $0x80" 
            : "+r" (rax)                   // read-write: in=call num, out=retval
            : "r" (rbx), "r" (rcx), "r" (rdx)   // read-only inputs
            : "memory", "r8", "r9", "r10", "r11"
           );
    
     return rax;
    

    This will work regardless of which registers you need to use. It's also probably more compatible with the macros you're trying to use to erase types.

Incidentally, if this is 64-bit x86/Linux then you should be using syscall rather than int $0x80, and the arguments belong in the ABI-standard incoming-argument registers (rdi, rsi, rdx, rcx, r8, r9 in that order), not in rbx, rcx, rdx etc. The system call number still goes in rax, though. (Use call numbers from #include <asm/unistd.h> or <sys/syscall.h>, which will be appropriate for the native ABI of the mode you're compiling for, another reason not to use int $0x80 in 64-bit mode.)

Also, the asm statement for the system-call instruction should have a "memory" clobber and be declared volatile; almost all system calls access memory somehow.

(As a micro-optimization, I suppose you could have a list of system calls that don't read memory, write memory, or modify the virtual address space, and avoid the memory clobber for them. It would be a pretty short list and I'm not sure it would be worth the trouble. Or use the syntax shown in How can I indicate that the memory *pointed* to by an inline ASM argument may be used? to tell GCC which memory might be read or written, instead of a "memory" clobber, if you write wrappers for specific syscalls.

Some of the no-pointer cases include getpid where it would be a lot faster to call into the VDSO to avoid a round trip to kernel mode and back, like glibc does for the appropriate syscalls. That also applies to clock_gettime which does take pointers.)


Incidentally, beware of the actual kernel interfaces not matching up with the interfaces presented by the C library's wrappers. This is generally documented in the NOTES section of the man page, e.g. for brk(2) and getpriority(2)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
zwol
  • 135,547
  • 38
  • 252
  • 361
  • Thanks for the link to the canonical explanation of why to use `syscall` in 64-bit code, @fuz. – zwol Sep 26 '20 at 23:44
  • @Tim and zwol: I edited the asm statements to actually be "safe" for using the 32-bit int 0x80 ABI in 64-bit code, with correct clobbers. I think it's important not to have dangerous inline asm examples on SO that might appear to work, but have missing clobbers or other constraints. The result is really clunky and weird looking, and I'd suggest either rewriting for the 64-bit `syscall` ABI so it can support 64-bit pointers, or changing the var names to EAX etc. and pointing out that this should be used in 32-bit code only. But that would be more intrusive so I didn't do it. – Peter Cordes Oct 03 '20 at 02:54
  • As it stands now, these inline asm statements are basically showing part of the reason not to use `int 0x80` in 64-bit code, with basically no real use-cases. – Peter Cordes Oct 03 '20 at 02:55