I am trying to understand the RIP relative offset used in small-code model. Perhaps the only approachable resource on the internet on this topic is: https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models But in this post also a few things are not clear. I am using this simple program to understand a few things:
// sample.cc
int arr[10] = {0};
int arr_big[100000] = {0};
int arr2[500] = {0};
int main() {
int t = 0;
t += arr[7];
t +=arr_big[6];
t += arr2[10];
return 0;
}
Compilation: g++ -c sample.cc -o sample.o
Object code for .text section:(objdump -dS sample.o
)
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
b: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 11 <main+0x11>
11: 01 45 fc add %eax,-0x4(%rbp)
14: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 1a <main+0x1a>
1a: 01 45 fc add %eax,-0x4(%rbp)
1d: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 23 <main+0x23>
23: 01 45 fc add %eax,-0x4(%rbp)
26: b8 00 00 00 00 mov $0x0,%eax
2b: 5d pop %rbp
2c: c3 ret
Relocation table: (readelf -r sample.o
)
Relocation section '.rela.text' at offset 0x1a8 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000d 000300000002 R_X86_64_PC32 0000000000000000 arr + 18
000000000016 000400000002 R_X86_64_PC32 0000000000000040 arr_big + 14
00000000001f 000500000002 R_X86_64_PC32 0000000000061ac0 arr2 + 24
From this answer what I understand is, offset is the first byte of text section that has to be modified. Compiler does not know the position of any relocatable entry in advance, that's why it creates the sections filled with 00
which will be populated by the linker.
This explanation is understandable once we look at the objdump
output. The first relocation has offset 0xd
and the "d-th" byte in .text is the section containing 0s in this line 8b 05 00 00 00 00
.
So, linker will fill the address of arr
in this position. R_X86_64_PC32
means "take the symbol value, add the addend and subtract the offset". I am not understanding this calculation. What do they mean by "symbol value"?
In small code model all offsets will be relative to instruction pointer (RIP). So, for the line mov 0x0(%rip),%eax
, the RIP value will be next instruction address (0x11
). Offset is 0xd
and addend is 0x18
. So, if we add the addend to RIP and subtract offset (0x11 + 0x18 - 0xd
) it becomes 0x1c
which is 7th integer (1 int = 4 bytes). It makes sense, because that instruction is trying to access 7th index in array arr
. What I don't understand is:
- How is the relative offset between RIP and
arr
calculated. Is it something calculated by linker at linking time? - Why does it need to be 32 bit?
- What does sym. value signify in relocation table? I am assuming it is the relative position of the symbols in their sections. E.g.
arr
has sym value0
as it is the first entry in .bss section.arr_big
has40
as sym. value as it is the second entry after 40 bytes longarr
andarr2
has0x61ac0
as sym. value as it comes after arr_big and arr (40 + 100000 bytes).
Thanks in advance.