15

Trying to compile non-PIC code into a shared library on x64 with gcc results in an error, something like:

/usr/bin/ld: /tmp/ccQ2ttcT.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC

This question is about why this is so. I know that x64 has RIP-relative addressing which was designed to make PIC code more efficient. However, this doesn't mean load-time relocation can't be (in theory) applied to such code.

Some online sources, including this one (which is widely quoted on this issue) claim that there's some inherent limitation prohibiting non-PIC code in shared libs, because of RIP-relative addressing. I don't understand why this is true.

Consider "old x86" - a call instruction also has an IP-relative operand. And yet, x86 code with call in it compiles just fine into a shared lib without PIC, but using the load-time relocation R_386_PC32. Can't the same be done for the data RIP-relative addressing in x64?

Note that I fully understand the benefits of PIC code, and the performance penalty RIP-relative addressing helps alleviate. Still, I'm curious about the reason for not allowing using non-PIC code. Is there a real technical reasoning behind it, or is it just to encourage writing PIC code?

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • @user786653: I'm well familiar with that (excellent) document but I don't think it has an answer for my question. If you think otherwise, please tell me where to find it – Eli Bendersky Oct 23 '11 at 08:28
  • @user786653: thanks, you're extremely useful today - I mentioned this very link in my question. I'm glad you've read it before answering – Eli Bendersky Oct 23 '11 at 15:42
  • Sorry, I stumbled on it later on and had forgotten it was linked in your q. – user786653 Oct 23 '11 at 16:01

3 Answers3

18

Here is the best explanation I've read from a post on comp.unix.programmer:

Shared libs need PIC on x86-64, or more accurately, relocatable code has to be PIC. This is because a 32-bit immediate address operand used in the code might need more than 32 bits after relocation. If this happens, there is nowhere to write the new value.

Mat
  • 202,337
  • 40
  • 393
  • 406
  • Hmm... this makes sense (although it's hard to believe this would be a real limitation these days) – Eli Bendersky Oct 23 '11 at 08:26
  • 5
    @EliBendersky Your comment "real limitation these days" makes *no* sense. There are only 32-bits available in `CALL absolute-address` instruction encoding. If the call needs to go more than 2GB away (in either direction), then the runtime loader does not have enough bits to write the new destination address to. And changing x86 insturction encoding is the thing that AMD tried to avoid when they came up with x86_64. – Employed Russian Oct 23 '11 at 15:41
  • @EmployedRussian: What I meant to say is that it's hard to believe there are programs needing more than 2GB of code space. Are you familiar with any? – Eli Bendersky Oct 23 '11 at 16:08
  • 1
    @EliBendersky: on linux, check out `/proc//maps`. There are huge holes in the address space between regular text segment, shared libraries and kernel vsyscall area. – Mat Oct 23 '11 at 16:12
5

Just say something additional.

In url provided in the question, it mentions you can pass -mcmodel=large to gcc to tell the compiler to generate 64-bits immediate address operand for your code.

So, gcc -mcmodel=large -shared a.c will generate a non-PIC shared object.

-

Demos:

a.c:

#include <stdio.h>

void foo(void)
{
    printf("%p\n", main);
}

32-bit immediate address operand blocks you from generating non-PIC object.

xiami@gentoo ~ $ cc -shared -o a.so a.c
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/cck3FWeL.o: relocation R_X86_64_32 against `main' can not be used when making a shared object; recompile with -fPIC
/tmp/cck3FWeL.o: error adding symbols: Bad value
collect2: error: ld returned 1 exit status

Use -mcmodel=large to solve it. (The warnings only appear on my system because modification on .text is forbidden by my PaX kernel.)

xiami@gentoo ~ $ cc -mcmodel=large -shared -o a.so a.c
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccZ3b9Xk.o: warning: relocation in readonly section `.text'.
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/../../../../x86_64-pc-linux-gnu/bin/ld: warning: creating a DT_TEXTREL in object.

Now you can see the relocation entry's type is R_X86_64_64 instead of R_X86_64_32, R_X86_64_PLT32, R_X86_64_PLTOFF64.

xiami@gentoo ~ $ objdump -R a.so
a.so:      file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE 
...
0000000000000758 R_X86_64_64       printf
...

And on my system, link this shared object to a normal code and run the program will emit errors like: ./a.out: error while loading shared libraries: ./a.so: cannot make segment writable for relocation: Permission denied

This proves dynamic loader is tring to do relocations on .text which PIC library won't.

Xiami
  • 306
  • 3
  • 6
  • Just for the record, **`-mcmodel=large` is *much* less efficient than `-fPIC`**, so don't use it just to avoid `-fPIC`. But yes, the dynamic linker will do text relocations on 64-bit absolute addresses, and the reason why 32-bit absolute addresses aren't allowed in x86-64 ELF shared objects is that they're only 32-bit, not that they're absolute. – Peter Cordes Jan 23 '19 at 17:15
3

The thing is, PIC and non-PIC code is still different.

C source:

extern int x;
void func(void) { x += 1; }

Assembly, not PIC:

addl    $1, x(%rip)

Assembly, with PIC:

movq    x@GOTPCREL(%rip), %rax
addl    $1, (%rax)

So it looks like PIC code has to go through a relocation table to access global variables. It actually has to do the same thing for functions, but it can do functions through stubs created at link-time. This is transparent at the assembly level, while accessing globals is not. (If you need the address of a function, however, then PIC and non-PIC are different, just like globals.) Note that if you change the code as follows:

__attribute__((visibility("hidden"))) extern int x;

In this case, since GCC knows that the symbol must reside in the same object as the code, it emits the same code as the non-PIC version.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • Interesting, but honestly I'm not sure how it answers the question. Could you please clarify? – Eli Bendersky Oct 23 '11 at 08:26
  • Pretty sure x86_64 Linux doesn't do load-time relocations (to code), which is why PIC is different. Or were you asking why x86_64 doesn't do load-time relocations? Because it could... – Dietrich Epp Oct 23 '11 at 08:30
  • I was asking why x86_64 doesn't do load-time relocations. I know it can be forced to do them, but doesn't really want to, by default. Or to be more precise, `gcc` doesn't want to – Eli Bendersky Oct 23 '11 at 08:32
  • I think it was originally a design choice not to do load-time relocations, at which point support for them was never built into the toolchain. They would need to be supported in both the compiler and linker, after all. – Dietrich Epp Oct 23 '11 at 08:37
  • @EliBendersky: `addl $1, x(%rip)` is position-independent, referencing `x` relative to the location of the ADD instruction. This is what you'd get in a PIE executable. But **`-fPIC` also enables symbol-interposition**, and `x` is global (not "hidden" or `static`), so it's not known until runtime whether the `x` in this shared-object's data section is actually the right `x` to use. See https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/. – Peter Cordes Jan 23 '19 at 17:19
  • 2
    To see the difference between position-dependent vs. PIE, look at `int*foo(){return &x;}`, putting the address in a register. non-PIE will use `mov $x, %eax` (5 bytes), PIE will use `lea x(%rip), %rax` (7 bytes, but still just ALU), `-fPIC` will use `mov x@GOTPCREL(%rip), %rax` (also 7 bytes), a load from the GOT. See also [32-bit absolute addresses no longer allowed in x86-64 Linux?](https://stackoverflow.com/q/43367427). Indexing a static array is another case where position-independence hurts even without symbol interposition: `mov arr(,%rcx,4), %eax` uses an absolute disp32. – Peter Cordes Jan 23 '19 at 17:23