I was reading https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries and I am confused about how the compiler computes the offset to the GOT entry for a global variable reference. All snippets are from abovementioned link.
Original C code
int myglob = 42;
int ml_func(int a, int b)
{
return myglob + a + b;
}
Compiled into libmlpic_dataonly.so
0000043c <ml_func>:
43c: 55 push ebp
43d: 89 e5 mov ebp,esp
43f: e8 16 00 00 00 call 45a <__i686.get_pc_thunk.cx>
444: 81 c1 b0 1b 00 00 add ecx,0x1bb0
44a: 8b 81 f0 ff ff ff mov eax,DWORD PTR [ecx-0x10]
450: 8b 00 mov eax,DWORD PTR [eax]
452: 03 45 08 add eax,DWORD PTR [ebp+0x8]
455: 03 45 0c add eax,DWORD PTR [ebp+0xc]
458: 5d pop ebp
459: c3 ret
0000045a <__i686.get_pc_thunk.cx>:
45a: 8b 0c 24 mov ecx,DWORD PTR [esp]
45d: c3 ret
The offset from the instruction pointer is 0x1bb0. (At 0x444)
Dissembly of a driver program linking to libmlpic_dataonly.so
0x0013143f <+3>: call 0x13145a <__i686.get_pc_thunk.cx>
0x00131444 <+8>: add ecx,0x1bb0
=> 0x0013144a <+14>: mov eax,DWORD PTR [ecx-0x10]
0x00131450 <+20>: mov eax,DWORD PTR [eax]
0x00131452 <+22>: add eax,DWORD PTR [ebp+0x8]
The offset from the instruction pointer is still 0x1bb0.
How does the compiler of libmlpic_dataonly.so know that the GOT entry will be 0x1bb0 bytes away from the instruction at compilation time?
I am confused because I read that multiple .text segments from multiple object modules are glued together to make a single .text segment at run-time. Same for .data. Essentially, the size of the eventual .text segment is unknown and the instruction lives within the .text segment whereas the GOT entry is in the .data segment. How is it possible to know the offset when compiling the shared lib?
EDIT:
Actually, I'm not sure, are all .text segments from different object modules really glued together into a single .text module? I tried creating 2 shared objects following the original C code except i renamed myglob
as myglob2
and ml_func
as ml_func2
. I then used gdb to check the addresses of ml_func
, ml_func2
, myglob
and myglob2
.
Dump of assembler code for function ml_func2:
=> 0x00007ffff7fc00f9 <+0>: endbr64
0x00007ffff7fc0107 <+14>: mov 0x2eca(%rip),%rax # 0x7ffff7fc2fd8
Dump of assembler code for function ml_func:
=> 0x00007ffff7fc50f9 <+0>: endbr64
0x00007ffff7fc5107 <+14>: mov 0x2ed2(%rip),%rax # 0x7ffff7fc7fe0
The layout seems to be ml_func2
, GOT entry for myglob2
, ml_func
, GOT entry for myglob
.