5

I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.

The linker complains about relocation truncations, very much like those in this question.

I've already tried using -mcmodel=large, but as the answer to that same question says, I would "need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.

I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:

ld: failed to convert GOTPCREL relocation; relink with --no-relax

and adding that flag also doesn't help.

I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.

I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.

Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?

Seirios
  • 107
  • 7
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/204244/discussion-on-question-by-seirios-is-it-possible-in-practice-to-compile-millions). – Samuel Liew Dec 14 '19 at 03:22
  • Millions of small functions should still be less than 2GiB of code size, unless they're not actually small or you have a some large static data to go with them. Try `-mcmodel=medium` and similar before going straight to `large`, especially if you care about performance! (movabs imm64 / indirect call instead of just `call rel32` is horrible.) Using 64-bit absolute only for large static arrays is not nearly as bad. – Peter Cordes Jan 18 '22 at 19:45

1 Answers1

2

Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)

Explanation

Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).

call *__libc_start_main@GOTPCREL(%rip)

start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.

Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.

Patching

In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.

Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.

Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.

So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.

First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:

leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15

With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:

movabs $__libc_start_main@GOT, %r11
call *(%r11, %r15)

We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.

N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.

Testing

Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.

Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.

The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.

extern int foo();
extern int bar();
int foo(){
    bar();
    // Call sys_exit
    asm( "mov $0x3c, %%rax \n"
             "xor %%rdi, %%rdi \n"
           "syscall \n"
       ".zero 1 << 32 \n"
    : : : "rax", "rdx");
    return 0;
}
int bar(){
    return 0;
}
int __libc_start_main(){
    foo();
    return 0;
}
int main(){
    return 0;
}

N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.

  • 1
    Posted the [patch](https://gist.github.com/cheremnov/74c6fa8998635634b9b5be3832f02862) for Glibc startup files on Gist, so as not to clutter this answer. – Andrei_Cheremnov Jan 18 '22 at 19:44
  • 1
    I no longer have the setup to test if your answer solves my original problem. However, you do provide an explanation and what seems like a proper solution, so answer accepted! – Seirios May 19 '22 at 21:46