1

I have a strange situation that seems to be working well for me, but I need to know how to either make this better or how to live this.

I am using C++ as a compiled scripting language for a game engine. The RISC-V system call ABI is the same as the C function calling convention, with the exception that instead of an 8th integer or pointer argument, A7 is used for the system call number. Yes, you know where this is going. Behold:

extern "C" long syscall_enter(...);

template <typename... Args>
inline long syscall(long syscall_n, Args&&... args)
{
    asm volatile ("li a7, %0" : : "i"(syscall_n));
    return syscall_enter(std::forward<Args>(args)...);
}

While syscall_enter is just a symbol in .text with the syscall instruction and a ret. The system call return value is also the same register as a normal function return.

000103f0 <syscall_enter>:
syscall_enter():
   103f0:       00000073                ecall
   103f4:       00008067                ret

Before this, I had to create 20+ functions to cover all the various ways to make system calls with integers and pointers with compiler barrier, and when I wanted to add a function that took floating-point values it would say the call was ambigous as integers and floats can be converted back and forth. So, I could either start to add unique names to the functions, or just solve this mess a better way. It was honestly irritating and putting a damper on an otherwise excellent experience. I really love being able to use C++ on "both sides".

The instructions generated by the compiler seems alright. It JAL and JALR syscall_enter, which is fine. The compiler seems a little bit confused, but I don't mind one extra instruction.

   10204:       1f500793                li      a5,501
   10208:       00078893                mv      a7,a5
   1020c:       00000513                li      a0,0
   10210:       1e0000ef                jal     ra,103f0 <syscall_enter>

As well as center camera on position:

   100d4:       19600793                li      a5,406
   100d8:       00078893                mv      a7,a5
   100dc:       000127b7                lui     a5,0x12
   100e0:       4207b587                fld     fa1,1056(a5) # 12420 <_exit+0x2308>
   100e4:       22b58553                fmv.d   fa0,fa1
   100e8:       010000ef                jal     ra,100f8 <syscall_enter>

Again one extra move instruction. Looks alright. The API is heavily in use already, and there is also a threading API which works with this.

Now, is there an even better way? I couldn't think of a better way to load a7 with a number and then force the compiler to set a function call up, without making an actual function call. I was thinking about using a template parameter for the system call number, but I'm not so sure about the rest. Maybe we can constrain the number of arguments to 7? It won't be correct when there are integer and floating-point arguments, but that's fine. Stack-stored structs are easy to pass.

After some testing, I have decided to use this:

extern "C" long syscall_enter(...);

template <typename... Args>
inline long syscall(long syscall_n, Args&&... args)
{
    // This will prevent some cases of too many arguments,
    // but not a mix of float and integral arguments.
    static_assert(sizeof...(args) < 8, "There is a system call limit of 8 integer arguments");
    // The memory clobbering prevents reordering of a7
    asm volatile ("li a7, %0" : : "i"(syscall_n) : "a7", "memory");
    return syscall_enter(std::forward<Args>(args)...);
    asm volatile("" : : : "memory");
}

Should suffice. No need to for syscall function spam. The check to count arguments is not optimal, since it should only prevent the usage of the 8th integral register (which means counting integral, pointer and reference parameters). But it will prevent some cases.

gonzo
  • 442
  • 4
  • 15
  • Normally, if you just write plain standard compliant C++ without inline ASM bits, the compiler will take care of the details... – Jesper Juhl Apr 10 '20 at 20:01
  • 1
    `asm volatile ("mv a7, %0" : : "r"(syscall_n));` You wrote a register without telling the compiler. When this function inlines, it will break things by stepping on `a7` if the compiler was using it for anything. Also, there's zero guarantee that `a7` will still be set before the `jal`. The usual way to do syscall wrapper macros is to make different ones for each possible number of args, and use a `"memory"` clobber on the asm statement. [RISC-V inline assembly struct optimized away](https://stackoverflow.com/q/61119506) was on the right track, just missing a `"memory"` clobber. – Peter Cordes Apr 10 '20 at 20:32
  • 1
    e.g. MUSL libc has `__syscall3` for a 3-arg syscall. IDK about syscalls taking FP args; I guess you'd need different macros for permutations of that. e.g. https://github.com/bpowers/musl/blob/master/arch/mips/syscall_arch.h is for MIPS, using asm register locals. – Peter Cordes Apr 10 '20 at 20:35
  • 1
    *The compiler seems a little bit confused, but I don't mind one extra instruction.* Confused how? Setting a0=0 is probably number of FP reg args for a variadic. And You asked for `"r"(syscall_n)` in a register so the compiler has to `li` into some register before the asm from your asm statement. You could have used an asm register local var with an empty asm statement to force the value into `a7`, or used an `"i"` constraint for an `li` instruction in your asm. Of course none of this can be safe unless you make the call from the asm statement. – Peter Cordes Apr 10 '20 at 20:43
  • @PeterCordes Thanks for all your help so far. Yes, I did add a clobber and everything has been going fine so far. However, I have found C++ answers for everything I have done so far except for this. I am wondering if I put a memory clobber in before moving into A7 and then again before calling the function, will that work? – gonzo Apr 10 '20 at 21:07
  • 1
    Huh? You talk about it like you'd use a separate asm statement for the memory clobber, instead of attaching it to the `asm` statement containing the `syscall` instruction. (Or containing a `jal syscall_enter` instruction but making calls from inline asm is a mess) That might happen to work, but why not just tell the compiler what's really going on? If you wanted to do a silly hack like that, you'd include a `"memory"` clobber in the asm statement that puts the callnum into `a7`. That might happen to be safe, especially if you put an `asm("":::"memory")` after the syscall_enter() as well, IDK. – Peter Cordes Apr 10 '20 at 21:28
  • @PeterCordes I have updated my question and added a new version that I'm using now. Hopefully this will prove to work in all cases. I will probably notice pretty quickly if it stops working in some cases though. – gonzo Apr 10 '20 at 22:52
  • 1
    That's true, I think the only failure mode left is having the compiler use `a7` for something. Which as you say should hopefully fail in an obvious, easy-to-debug way at any given call-site. Unless it's in a seldom-used error-handling codepath or something. This is an inline function so every single callsite has its own separate possibility of failure depending on surrounding code. It's probably unlikely that the compiler will ever want to do so much stuff with local vars that it ends up wanting to use `a7` after the `li` / mem clobber before the function call – Peter Cordes Apr 10 '20 at 22:55

2 Answers2

2

There's two problems with this.

The first is that you aren't telling the compiler you are using a7, so it might try to put something else there, resulting in incorrect code. You need to add a7 to the clobbers list of the asm:

asm volatile ("mv a7, %0" : : "r"(syscall_n) : "a7");

The second is that the asm statement is not connected to the call, so the compiler may reorder things, and, in particular, move other code in between the asm mv instruction and the call. If that happens and the code in question modifies a7, you'll end up calling the wrong syscall.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
1

This is the function I'm using now. Many thanks to @PeterCordes for all the help.

extern "C" long syscall_enter(...);

template <typename... Args>
inline long apicall(long syscall_n, Args&&... args)
{
    // This will prevent some cases of too many arguments,
    // but not a mix of float and integral arguments.
    static_assert(sizeof...(args) < 8, "There is a system call limit of 8 integer arguments");
    // The memory clobbering prevents reordering of a7
    asm volatile ("li a7, %0" : : "i"(syscall_n) : "a7", "memory");
    return syscall_enter(std::forward<Args>(args)...);
    asm volatile("" : : : "memory");
}

It works well for me. Again, the primary reason to avoid the syscall-function-spam solution, is because if you have 2 functions where one takes an integral argument and another that takes a floating-point argument, then the function call will be ambigous, and now you need to start thinking about which function to call. I have tested this solution with a mix of float and integral arguments, and it's working as it should. One drawback is that it puts floating-point arguments into 64-bit registers, so it will be a tiny amount slower during the system call.

Again, there was a C++ solution!

gonzo
  • 442
  • 4
  • 15
  • Having `syscall_enter` be a separate non-inline function seems to defeat most of the purpose of using inline asm, but I guess this custom calling convention (passing the callnum in a7 regardless of number of other args) lets the definition of that function be simpler than traditional wrapper functions that take the callnum first and have to copy every other possible arg over by one. – Peter Cordes Apr 11 '20 at 00:48