0

I want to write something like this:

#include <stdint.h>

inline uint64_t with_rsp(uint64_t x, uint64_t y) {
  uint64_t z, w;
  uint64_t rsp;
  asm ("mov %%rsp, %[rsp]\t\n"
       "mov $0x13, %%rsp\t\n"
       "mov %[x], %%rdx\t\n"
       "mulx %[y], %[z], %[w]\t\n"
       "mov %[rsp], %%rsp\t\n"
       : [z] "=&r" (z), [w] "=&r" (w)
       : [x] "r" (x), [y] "r" (y), [rsp] "m" (rsp)
       : "rdx"
       );
  return z + w;
}

inline uint64_t with_rbp(uint64_t x, uint64_t y) {
  uint64_t z, w;
  uint64_t rbp;
  asm ("mov %%rbp, %[rbp]\t\n"
       "mov $0x13, %%rbp\t\n"
       "mov %[x], %%rdx\t\n"
       "mulx %[y], %[z], %[w]\t\n"
       "mov %[rbp], %%rbp\t\n"
       : [z] "=&r" (z), [w] "=&r" (w)
       : [x] "r" (x), [y] "r" (y), [rbp] "m" (rbp)
       : "rdx"
       );
  return z + w;
}

int main() {
  uint64_t x = 15, y = 3, zw;
  if (inline_asm_uses_rbp()) {
    zw = with_rsp(x, y);
  } else {
    zw = with_rbp(x, y);
  }
  return zw;
}

Ideally, the if statement should compile away at compile-time (but I don't think I can do this with preprocessor macros, because those get evaluated before the code is assembled). So I'm fine with needing some sort of jump to get it to work, though I'd prefer to not need that.

The reason I need this is that I have some inline assembly that needs to be able to use 15 registers, plus some memory locations on the stack, and gcc is choosing rsp-based offsets in some locations where the function is inlined, and it's choosing rbp-based offsets in other locations. (A separate assembly module isn't a good match for this because I'd like to avoid the overhead of a function call.)

Jason Gross
  • 5,928
  • 1
  • 26
  • 53
  • 4
    Don't use inline assembly then. I believe you have already been told that? – Jester Sep 15 '17 at 23:00
  • 1
    Why inline assembly? – fuz Sep 15 '17 at 23:00
  • Anyway, compiling with `-fomit-frame-pointer` (which is turned on by optimization too) you should get `rsp` based offsets. – Jester Sep 15 '17 at 23:01
  • This reeks of premature optimization. – o11c Sep 15 '17 at 23:07
  • 2
    If your function is do complicated that you need 15 registers and stack space, the function call overhead very likely does not matter at all. – fuz Sep 15 '17 at 23:17
  • This is not a duplicate of https://stackoverflow.com/questions/1415414/is-there-a-gcc-macro-for-determining-that-frame-pointers-are-not-eliminated; I am not asking whether or not gcc has allocated a frame pointer; I'm instead asking how to test whether gcc is compiling stack memory references based on rsp or rbp. – Jason Gross Sep 16 '17 at 00:23
  • @Jester I'm already compiling with `-fomit-frame-pointer`. gcc uses `rbp` in some places and `rsp` in other places in the same binary, depending on where the inline assembly function is called from. – Jason Gross Sep 16 '17 at 00:25
  • If frame pointer is off, you can use `rbp` in your asm constraints hence the offsets will be relative to `rsp` because the compiler has no other choice. Provide a [mcve] if you got a counterexample. – Jester Sep 16 '17 at 00:25
  • @Jester `-fomit-frame-pointer` does not force gcc to omit the frame pointer, it merely permits it to. Here's an example where gcc is doing rbp-based memory addressing even though I passed `-O3 -fomit-frame-pointer` (and even though it uses rsp-based addressing in other places the same function is inlined): https://gist.github.com/JasonGross/af1a2a59bbdaa57378116fed0cbbb449#file-measure-s-L6524-L6527 – Jason Gross Sep 16 '17 at 00:41
  • That's not using `rbp` in the asm constraints. The point is, with `-fomit-frame-pointer` you can force the compiler to give up `rbp`, so you should be able to use all 15 registers. If you don't need it, then of course the compiler is free to use it. – Jester Sep 16 '17 at 00:43
  • How can I use `rbp` in the asm constraints? If I add it to the list of clobbers, I get ` measure.c: In function ‘measure’: measure.c:102:1: error: bp cannot be used in asm here } ^ lto-wrapper: fatal error: gcc returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status ` And https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints doesn't list a letter in the x86 family for the bp registers – Jason Gross Sep 16 '17 at 00:44
  • 2
    The only way I could get that error is if I used `alloca` (I consider that a bug, the compiler should work around it). Clobbering `rbp` is normally allowed. Maybe there is something else that also needs dynamic stack, such as VLA? Update: yeah VLA also prohibits it. – Jester Sep 16 '17 at 00:49
  • We do indeed have `unsigned char *buf = aligned_alloc(64, 1024); long long* cycles = calloc(n + 1, sizeof(long long));`. Is there a bug report open for this? – Jason Gross Sep 16 '17 at 00:50
  • 1
    Ah! And, indeed, factoring out the call to a separate function from the one that allocates the buffer and array fixes the issue! – Jason Gross Sep 16 '17 at 00:54
  • @JasonGross I am sorry. I reopened your question. Typically, you'd use *extended asm* for that sort of thing. – fuz Sep 16 '17 at 11:11
  • 1
    Using a `"r"(x)` constraint on an input and then using a `mov` to copy it to `%rdx` is just shooting yourself in the foot if you're running low on registers. Also, you have `[rsp] "m" (rsp)` as an input operand when you're actually using it as a dummy *output* operand for saving `rsp` to the stack. Also keep in mind that other than `"=m"` operands, you can't safely write stack memory in x86-64 code (so don't try to `push`; There's no way to tell it you clobber the red-zone. See https://stackoverflow.com/questions/34520013/using-base-pointer-register-in-c-inline-asm) – Peter Cordes Sep 16 '17 at 12:39
  • Why are you using inline-asm for this? gcc knows how to use `mulx` when you write `unsigned __int128 foo = (uint64_t)a * (uint64_t)b;`. If you're willing to consider the overhead of stashing `rsp` into the stack and using 15 registers, probably the overhead of a function call would be lower. Also, **you could stash `rsp` and `rbp` in `xmm0` / `xmm1` and have 16 integer registers**. (As long as you don't have any signal handlers installed; you will have a bad time if a signal tries to use the stack while `rsp` isn't pointing there.) – Peter Cordes Sep 16 '17 at 12:44

0 Answers0