0

I've seen a lot of questions at SO asking why not all code is compiled as PIC or why we can't always use -fPIC.

However all of the answers lack an explanation about what happens when your objects are compiled with -fPIC but you link them to an executable that is not a PIE (position-independent executable).

From my understanding (using a few small examples and disassembling/inspecting them with readelf), it looks like compiling with -fPIC does not result in a different binary when linked without -pie -fPIE.

My (simplistic?) explanation would be that during link time, it is known that the final executable is not intended to be relocatable, so we can resolve all addresses as in a non-PIC build and get rid of GOT and PLT completely. That is also my observation: If I build a PIE, readelf displays a GOT/PLT section. If I don't build a PIE, the GOT/PLT is gone, no matter if I used -fPIC or not.

My question is whether this observation

  • is correct (as well as my explanation) and
  • if not, why is my reasoning wrong then?

I found it surprisingly difficult to find a concrete answer to this simple question, that's why I'm asking here.

andreee
  • 4,459
  • 22
  • 42

1 Answers1

1

it looks like compiling with -fPIC does not result in a different binary when linked without -pie -fPIE.

It is trivial to prove that this is not the case.

#include <stdio.h>

const char *p = "Hello";
int x[1024];

int main()
{
  printf("%d: %s\n", __LINE__, p);
  x[1] = 42;
  return x[2];
}

Using gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1) and GNU ld version 2.38-27.fc37

gcc -c -fPIC -c x.c && gcc -no-pie x.o -o x1
gcc -no-pie x.c -o x2

Disassembly for x1:

   0x0000000000401126 <+0>:     push   %rbp
   0x0000000000401127 <+1>:     mov    %rsp,%rbp
   0x000000000040112a <+4>:     mov    $0x404028,%rax
   0x0000000000401131 <+11>:    mov    (%rax),%rax
   0x0000000000401134 <+14>:    mov    %rax,%rdx
   0x0000000000401137 <+17>:    mov    $0x8,%esi
   0x000000000040113c <+22>:    lea    0xed3(%rip),%rax        # 0x402016
   0x0000000000401143 <+29>:    mov    %rax,%rdi
   0x0000000000401146 <+32>:    mov    $0x0,%eax
   0x000000000040114b <+37>:    call   0x401030 <printf@plt>
   0x0000000000401150 <+42>:    mov    $0x404060,%rax
   0x0000000000401157 <+49>:    movl   $0x2a,0x4(%rax)
   0x000000000040115e <+56>:    mov    $0x404060,%rax
   0x0000000000401165 <+63>:    mov    0x8(%rax),%eax
   0x0000000000401168 <+66>:    pop    %rbp
   0x0000000000401169 <+67>:    ret

For x2:

   0x0000000000401126 <+0>:     push   %rbp
   0x0000000000401127 <+1>:     mov    %rsp,%rbp
   0x000000000040112a <+4>:     mov    0x2ef7(%rip),%rax        # 0x404028 <p>
   0x0000000000401131 <+11>:    mov    %rax,%rdx
   0x0000000000401134 <+14>:    mov    $0x8,%esi
   0x0000000000401139 <+19>:    mov    $0x402016,%edi
   0x000000000040113e <+24>:    mov    $0x0,%eax
   0x0000000000401143 <+29>:    call   0x401030 <printf@plt>
   0x0000000000401148 <+34>:    movl   $0x2a,0x2f12(%rip)        # 0x404064 <x+4>
   0x0000000000401152 <+44>:    mov    0x2f10(%rip),%eax        # 0x404068 <x+8>
   0x0000000000401158 <+50>:    pop    %rbp
   0x0000000000401159 <+51>:    ret

That is not an identical binary -- using -fPIC added some overhead.

The amount of overhead depends on the platform, and may also depend on the amount of linker optimizations which can be performed.

Update:

if not, why is my reasoning wrong then?

Your reasoning is not wrong, but it requires linker to perform non-trivial amount of work rewriting GOT-relative instructions and relocations into absolute form. You can read more about linker relaxation here.

Whether the linker can optimize all of the GOT-relative instructions out ... depends on how smart the linker is, and how hard it tries.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • That's interesting. Maybe my example was being too simple, I think I did not include any function calls to the standard library in my case. Do you know _why_ it is different? From my understanding, if I'm not creating a PIE, the linker should be able to resolve everything at link time. So the question remains why there is a difference eventually. – andreee Jun 09 '23 at 08:33
  • @andreee It's access to global data that requires extra steps on `x86_64` with `-fPIC`. As to why your example didn't show the difference, I _can't tell_ because you didn't show what your example was. – Employed Russian Jun 09 '23 at 14:33
  • Okay, I guess my example was _too_ minimal (just a `main` function that calls some `void foo(){}` from a different TU). What I still don't understand: Why is the "access to global data" necessary when linking to a non-PIC/PIE and what is this data anyway. I still wonder why my explanation from my original question) does not make sense. What am I overlooking? – andreee Jul 24 '23 at 09:11
  • @andreee I've updated the answer. – Employed Russian Jul 24 '23 at 15:04