What happens when you need to call a function that is > 2^32-bit away?

Question

I just found out that call instructions that we usually are actually program-counter relative. Yet the x86 instruction uses a 32-bit wide offset to indicate a relative number.

What if I want to jump > 4GB away?

According to [this](http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models), you would need either use absolute 64-bit addresses or relative to a register that's holding the 64-bit offset. — Mysticial, Aug 04 '15 at 20:56
In practice this usually isn't an issue. I don't think anyone has written a program where the executable part of the program is bigger than 2GB. Shared libraries (DLLs), which might be > 2GB away are accessed through pointers anyways. — Ross Ridge, Aug 04 '15 at 21:12
Thanks both of you. Ross, can you explain what you mean by anyway? I'm new to this, so to me I don't see why the dll needs to be accessed via a pointer anyway. What if the compiler puts an E8 opcode to call this dll method? Or are you saying that the compiler could never do that, because it doesn't know whether this dll is, and therefore it would be putting a call via an indirection? — halivingston, Aug 04 '15 at 21:23
@halivingston: I am not Ross, but what you say is exactly right: A compiler does not emit direct calls to external modules - instead the calls goes through an indirection, a stub in the import table, which does use a full 64-bit pointer for the target address. — 500 - Internal Server Error, Aug 04 '15 at 21:26

Peter Cordes · Accepted Answer · 2018-04-27T07:43:18.087

1

I guess this could come up if you JIT some code into a buffer allocated more than 2^32 away from some functions it needs to call. The simple answer is: don't do that.

On Linux, for example, use mmap(MAP_32BIT) to allocate memory in the low 2GiB of virtual address space, if you want the JITed code to call function in the main executable. (Assuming a position-dependent executable).

In a PIE executable or a shared library (which typically won't be mapped in the low 32 bits of virtual address space), you might try to allocate memory near your own code by trying mmap without MAP_FIXED, and trying different addresses in range if that doesn't work the first time. mmap(hint_address, ...) / check if it's within +-2GiB of the code and/or data it needs to reach / munmap and retry with a different hint.

The reason is that the only workaround is to use an absolute address indirect call. See Call an absolute pointer in x86 machine code. You'd need to load the target address into a register, or have the address stored in memory as a pointer, and jump to that. See Intel's insn ref manual, where all the available encodings of call are listed.

Also the x86 tag wiki links to https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

If you don't need it to be super-efficient, one way to actually JIT the absolute-indirect calls would be to put a table of pointers at a known location relative to the JITed code so it can use indirect call [rel pointer_to_func1] (RIP-relative addressing). This is like the global offset table used by Unix shared libs, and how compiler-generated code calls shared library functions if compiled with gcc -fno-plt.

edited Apr 27 '18 at 07:43

answered Aug 04 '15 at 21:25

Peter Cordes

328,167
45
605
847

You've figured out what I'm doing :) I have code that is sitting in memory, almost always 2^32 away, and I'm willing to do the copy into a buffer close to other code, but I need to understand how to do that, so that's why these questions :-) – halivingston Aug 04 '15 at 21:28
Isn't your "workaround" how methods in shared libraries are called? – halivingston Aug 04 '15 at 21:30
Shared libraries get relocated when they're loaded, by the dynamic linker. RIP-relative addressing means they only need relocations for external symbols (i.e. calls to functions in *other* libraries), which is a big improvement from x86 to x86-64. I forget the details; calls actually call a stub that loads a value from a global offset table, or something. Try single-stepping in ASM for the first call to a library function (and you'll see it actually do symbol lookup which takes a ton of insns). Then try single-stepping a subsequent call to the same function, and it's a lot faster. – Peter Cordes Aug 04 '15 at 22:01
@halivingston: try allocating your buffer so it's close enough to your other code. I'm sure you're not the first person to run into this with JIT. It might require some more platform-specific calls, like `mmap(..., MAP_FIXED|MAP_ANONYMOUS, ...)` to allocate memory at a specific address in memory. Avoiding stepping on already-mapped parts of your address space might require parsing `/proc/self/maps` on Linux, or something else on other platforms. Maybe there's an API for this already. – Peter Cordes Aug 04 '15 at 22:05
You might just want to ask your ultimate question of "How do I set things up so I can call other functions from JIT-compiled code?". This looks like a case of the *X Y problem*. (Trying to do something hard / weird because you picked the wrong solution to an earlier problem.) – Peter Cordes Aug 04 '15 at 22:08
1

@PeterCordes This is not a appropriate situation to use [MAP_FIXED](http://stackoverflow.com/a/24393264/904148). – Timothy Baldwin Aug 05 '15 at 09:01
@TimothyBaldwin: ok, good point. No way to avoid the race condition between checking and mapping. Better to just try it with an address hint, and `munmap`/try again somewhere else on failure. – Peter Cordes Aug 05 '15 at 11:13

score 0 · Answer 2 · answered Aug 07 '15 at 23:37

0

I just happened to have this very need: To emit a call to an absolute 64-bit address into some dynamically created code, and it turns out not to be hard at all albeit slightly kludgy compared to the direct absolute jump we have in 32-bit mode (your exact asm syntax may vary depending on which assembler you happen to be using) e.g.:

  call qword ptr [rel @1]
  jmp @2
@1:
  dq <64-bit address>
@2:

answered Aug 07 '15 at 23:37

500 - Internal Server Error

28,327
8
59
66

That's inefficient compared to `mov rax, 64-bit address` / `call rax`. [Call an absolute pointer in x86 machine code](//stackoverflow.com/q/19552158). Or if you load the pointer from memory, put the address data somewhere else so you don't have to jump over it, like after the `ret` in your function. (And preferably with other data so you don't waste a dTLB entry and L1d cache miss on loading data from a page of code.) – Peter Cordes Apr 27 '18 at 07:28
@Peter Cordes: Noted. In my scenario, though, rax (as well as all other registers) were already in use. – 500 - Internal Server Error Apr 27 '18 at 09:35
1

Ok, then `[rel @1]` is good, but put the data somewhere you don't have to `jmp` over it. It might be slightly less convenient to not have a single stand-alone fragment of machine code, but at worst you just put it before / after the whole function block. And BTW, 32-bit x86 doesn't have direct absolute `jmp` or `call`. A `rel32` can reach any 32-bit address, so you can always use `call rel32` in position-dependent code, though. – Peter Cordes Apr 27 '18 at 18:43

What happens when you need to call a function that is > 2^32-bit away?

2 Answers2