2

I was reading https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries and I am confused about how the compiler computes the offset to the GOT entry for a global variable reference. All snippets are from abovementioned link.

Original C code

int myglob = 42;

int ml_func(int a, int b)
{
    return myglob + a + b;
}

Compiled into libmlpic_dataonly.so

 0000043c <ml_func>:
 43c:   55                      push   ebp
 43d:   89 e5                   mov    ebp,esp
 43f:   e8 16 00 00 00          call   45a <__i686.get_pc_thunk.cx>
 444:   81 c1 b0 1b 00 00       add    ecx,0x1bb0
 44a:   8b 81 f0 ff ff ff       mov    eax,DWORD PTR [ecx-0x10]
 450:   8b 00                   mov    eax,DWORD PTR [eax]
 452:   03 45 08                add    eax,DWORD PTR [ebp+0x8]
 455:   03 45 0c                add    eax,DWORD PTR [ebp+0xc]
 458:   5d                      pop    ebp
 459:   c3                      ret

0000045a <__i686.get_pc_thunk.cx>:
 45a:   8b 0c 24                mov    ecx,DWORD PTR [esp]
 45d:   c3                      ret

The offset from the instruction pointer is 0x1bb0. (At 0x444)

Dissembly of a driver program linking to libmlpic_dataonly.so

   0x0013143f <+3>:   call   0x13145a <__i686.get_pc_thunk.cx>
   0x00131444 <+8>:   add    ecx,0x1bb0
=> 0x0013144a <+14>:  mov    eax,DWORD PTR [ecx-0x10]
   0x00131450 <+20>:  mov    eax,DWORD PTR [eax]
   0x00131452 <+22>:  add    eax,DWORD PTR [ebp+0x8]

The offset from the instruction pointer is still 0x1bb0.

How does the compiler of libmlpic_dataonly.so know that the GOT entry will be 0x1bb0 bytes away from the instruction at compilation time?

I am confused because I read that multiple .text segments from multiple object modules are glued together to make a single .text segment at run-time. Same for .data. Essentially, the size of the eventual .text segment is unknown and the instruction lives within the .text segment whereas the GOT entry is in the .data segment. How is it possible to know the offset when compiling the shared lib?

EDIT:

Actually, I'm not sure, are all .text segments from different object modules really glued together into a single .text module? I tried creating 2 shared objects following the original C code except i renamed myglob as myglob2 and ml_func as ml_func2. I then used gdb to check the addresses of ml_func, ml_func2, myglob and myglob2.

Dump of assembler code for function ml_func2:
=> 0x00007ffff7fc00f9 <+0>: endbr64 

   0x00007ffff7fc0107 <+14>:    mov    0x2eca(%rip),%rax        # 0x7ffff7fc2fd8

Dump of assembler code for function ml_func:
=> 0x00007ffff7fc50f9 <+0>: endbr64 

0x00007ffff7fc5107 <+14>:   mov    0x2ed2(%rip),%rax        # 0x7ffff7fc7fe0  

The layout seems to be ml_func2, GOT entry for myglob2, ml_func, GOT entry for myglob.

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
  • A `.so` file is already linked, i.e. this “glueing together” (linking) has already happened. Each `.so` file has exactly one GOT and the offsets don't change anymore. – fuz Jul 18 '21 at 14:10
  • 2
    Shared objects are really not comparable to normal object files. A shared object is linked from multiple object file and has one GOT. If you have multiple shared objects in your program, each of course has its own GOT. – fuz Jul 18 '21 at 14:12
  • Thanks for answering. I'm confused about is how the instructions and GOT table entries are laid out when the program runs. I originally expected all the instructions to be in the .text section and the GOT table entries to be in .data section as per the diagram below. https://www.geeksforgeeks.org/memory-layout-of-c-program/ But when I have shared objects, it seems to be .text for sharedObjectA, .data for sharedObjectA,...etc, .text for sharedObjectB, .data for sharedObjectB rather than just one .text for both .sharedObjectA and sharedObjectB and just one .data for both A and B. – curious-carp Jul 18 '21 at 14:16
  • 2
    Each shared object has its own text, data, rodata, ... sections. The sections of different shared objects are not mixed. Make sure not to confuse normal objects (whose sections are merged into one at link time) with shared objects. A shared object is really more like a separate complete program loaded into the address space of your program than a normal object. – fuz Jul 18 '21 at 14:21
  • Oh! That really clears up my confusion, thank you! I didn't realize shared objects and regular object files were treated differently. – curious-carp Jul 18 '21 at 15:20
  • 1
    That's 32-bit code so it can't use RIP-relative addressing. That was a new feature in x86-64 machine code. 32-bit position-independent sucks because it has to do crap like that to get a GOT address, instead of just directly referencing the symbol (because the relative distance between .text and .rodata is fixed when linking the .so - [Why are global variables in x86-64 accessed relative to the instruction pointer?](https://stackoverflow.com/q/56262889)) – Peter Cordes Jul 18 '21 at 21:33

0 Answers0