Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large
accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S
(sysdeps/x86_64/start.S
).
call *__libc_start_main@GOTPCREL(%rip)
start.S
emitted R_X86_64_GOTPCREL
for __libc_start_main
, which used relative addressing. x86_64 CALL
instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld
couldn't offset the relocation R_X86_64_GOTPCREL
because the code size surpassed 2GB.
Adding -fPIC
didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL
relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64
, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL
, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE
relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15
register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64
relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main
as:
movabs $__libc_start_main@GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL
with GLOBAL_OFFSET_TABLE
and R_X86_64_GOT64
. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64
with R_X86_64_PLTOFF64
for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large
. Startup files shouldn't contain 32-bit relocations.
The foo
function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large
. Also, add flags -O0 -fPIC -static
, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main
and main
.