I'm trying to figure out where jump tables (a data table pairing subroutine names with their addresses) are placed within an executable, and whether it's based on the language, the compiler, or if there's a standard placement perhaps within the headers of PE / ELF binaries. Which is it? And how can I locate the tables or find documentation on where they're placed?
What I tried so far:
First I read about each section of PE / ELF headers and didn't recognize for sure which one would be a jump table.
The most straight-forward way I thought of to figure it out, since I find the prospect of learning how compilers work pretty intimidating, was to disassemble a binary with a subroutine and find a section of the binary referencing that jump destination and others. Toward the start of this C program compiled to ELF format, I found the following section:
0000000000001020 <.plt>:
1020: ff 35 1a 2f 00 00 pushq 0x2f1a(%rip) # 3f40 <_GLOBAL_OFFSET_TABLE_+0x8>
1026: f2 ff 25 1b 2f 00 00 bnd jmpq *0x2f1b(%rip) # 3f48 <_GLOBAL_OFFSET_TABLE_+0x10>
102d: 0f 1f 00 nopl (%rax)
1030: f3 0f 1e fa endbr64
1034: 68 00 00 00 00 pushq $0x0
1039: f2 e9 e1 ff ff ff bnd jmpq 1020 <.plt>
103f: 90 nop
1040: f3 0f 1e fa endbr64
1044: 68 01 00 00 00 pushq $0x1
1049: f2 e9 d1 ff ff ff bnd jmpq 1020 <.plt>
...
Which I thought might be what a jump table would look like, with these addresses being offsets of various dynamically linked libs. I had previously seen a reference to an ELF header section with .plt but was not initially clear about whether it was a jump table. Further research indicated:
PLT stands for Procedure Linkage Table which is, put simply, used to call external procedures/functions whose address isn't known in the time of linking, and is left to be resolved by the dynamic linker at run time.
GOT stands for Global Offsets Table and is similarly used to resolve addresses. Both PLT and GOT and other relocation information is explained in greater length in this article.
I'm still working on finding which jump in this section (if any) points to the subroutine from my program. Perhaps that GOT is where I need to look next.
If more context is needed, here's why I'm asking:
I've been studying binary patching and particularly hooking techniques used to track malware behavior and how malware can prevent that tracking. Hooks (which are just instructions which redirect control flow to an intermediary function, then to the originally intended destination) can go many places, such as patched into shared binaries (libs) in memory, or even patched into kernel subroutines, but if I understand correctly, they're also sometimes injected directly into the subroutines within an executable binary.
What I'm studying is the potential for an attacker to prevent these hooks placed within the binary. Let's say the attacker uses an uncertain (from the perspective of the victim) jump destination right from the beginning of the malware's execution. Now let's say an analyst or automated heuristic analysis tool tries to disassemble a program, perhaps within a sandboxed environment, to ascertain the behavior of the program, but the web server the program reaches out to for this jump destination address will only return an entry point to the malicious control flow of the program when it executes on a certain date in the future. Until then it returns an address which makes the program behave in a benign manner. This is textbook evasion, made possible by the variable-length nature of x86/-64 architecture. I recently published a diagram visualizing the problem-set to the best of my understanding.
But if the compiler has built jump tables into the program, the analyst or threat detection system can still know the locations of the entry points to jump into and analyze those subroutines. Once those subroutines get executed at run-time under the targeted conditions, the registers can also be analyzed to find the address that the routines where executed from (x86 calling convention includes this info so that the subroutine knows where to return to), and from that information the analyst can also know other valid instruction boundaries to begin disassembly at.
I know barely anything about the workings of compilers and have read through the specs on the PE / ELF file headers, but perhaps I missed something. I'd really appreciate a pointer in the right direction.