0

I'm trying to figure out where jump tables (a data table pairing subroutine names with their addresses) are placed within an executable, and whether it's based on the language, the compiler, or if there's a standard placement perhaps within the headers of PE / ELF binaries. Which is it? And how can I locate the tables or find documentation on where they're placed?


What I tried so far:

First I read about each section of PE / ELF headers and didn't recognize for sure which one would be a jump table.

The most straight-forward way I thought of to figure it out, since I find the prospect of learning how compilers work pretty intimidating, was to disassemble a binary with a subroutine and find a section of the binary referencing that jump destination and others. Toward the start of this C program compiled to ELF format, I found the following section:

0000000000001020 <.plt>:
    1020:   ff 35 1a 2f 00 00       pushq  0x2f1a(%rip)        # 3f40 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:   f2 ff 25 1b 2f 00 00    bnd jmpq *0x2f1b(%rip)        # 3f48 <_GLOBAL_OFFSET_TABLE_+0x10>
    102d:   0f 1f 00                nopl   (%rax)
    1030:   f3 0f 1e fa             endbr64 
    1034:   68 00 00 00 00          pushq  $0x0
    1039:   f2 e9 e1 ff ff ff       bnd jmpq 1020 <.plt>
    103f:   90                      nop
    1040:   f3 0f 1e fa             endbr64 
    1044:   68 01 00 00 00          pushq  $0x1
    1049:   f2 e9 d1 ff ff ff       bnd jmpq 1020 <.plt>
    ...

Which I thought might be what a jump table would look like, with these addresses being offsets of various dynamically linked libs. I had previously seen a reference to an ELF header section with .plt but was not initially clear about whether it was a jump table. Further research indicated:

PLT stands for Procedure Linkage Table which is, put simply, used to call external procedures/functions whose address isn't known in the time of linking, and is left to be resolved by the dynamic linker at run time.

GOT stands for Global Offsets Table and is similarly used to resolve addresses. Both PLT and GOT and other relocation information is explained in greater length in this article.

I'm still working on finding which jump in this section (if any) points to the subroutine from my program. Perhaps that GOT is where I need to look next.


If more context is needed, here's why I'm asking:

I've been studying binary patching and particularly hooking techniques used to track malware behavior and how malware can prevent that tracking. Hooks (which are just instructions which redirect control flow to an intermediary function, then to the originally intended destination) can go many places, such as patched into shared binaries (libs) in memory, or even patched into kernel subroutines, but if I understand correctly, they're also sometimes injected directly into the subroutines within an executable binary.

What I'm studying is the potential for an attacker to prevent these hooks placed within the binary. Let's say the attacker uses an uncertain (from the perspective of the victim) jump destination right from the beginning of the malware's execution. Now let's say an analyst or automated heuristic analysis tool tries to disassemble a program, perhaps within a sandboxed environment, to ascertain the behavior of the program, but the web server the program reaches out to for this jump destination address will only return an entry point to the malicious control flow of the program when it executes on a certain date in the future. Until then it returns an address which makes the program behave in a benign manner. This is textbook evasion, made possible by the variable-length nature of x86/-64 architecture. I recently published a diagram visualizing the problem-set to the best of my understanding.

But if the compiler has built jump tables into the program, the analyst or threat detection system can still know the locations of the entry points to jump into and analyze those subroutines. Once those subroutines get executed at run-time under the targeted conditions, the registers can also be analyzed to find the address that the routines where executed from (x86 calling convention includes this info so that the subroutine knows where to return to), and from that information the analyst can also know other valid instruction boundaries to begin disassembly at.

I know barely anything about the workings of compilers and have read through the specs on the PE / ELF file headers, but perhaps I missed something. I'd really appreciate a pointer in the right direction.

J.Todd
  • 707
  • 1
  • 12
  • 34
  • 2
    "a data table pairing subroutine names with their addresses" - Compiled languages rarely use _names_. Names are problematic for CPU's - variable length, bad information density, parsing required. Jump tables can be implemented in multiple other ways, though - there is not a single solution. – MSalters Dec 12 '21 at 14:43
  • @MSalters right, well in the assembly we would reference a name, but surely the assembler converts it to a more efficient binary representation. I suppose my semantics were bad there. – J.Todd Dec 12 '21 at 14:46
  • 1
    Well, yes, that name stands for a function and the efficient representation is the address of the function. So a jump table generally does not pair anything. – MSalters Dec 12 '21 at 14:48
  • @MSalters then what is the purpose of the jump table? Perhaps I'm having a dull moment here, but I'm failing to understand why the compiler would have any need for a jump table at all. – J.Todd Dec 12 '21 at 14:51
  • @MSalters Oh edit: perhaps the answer is in a paragraph I already quoted "PLT stands for Procedure Linkage Table which is, put simply, used to call external procedures/functions whose address isn't known in the time of linking, and is left to be resolved by the dynamic linker at run time." - so subroutines statically defined as part of the program with known locations at the time of compilation would have no need to be included in any jump table, right? – J.Todd Dec 12 '21 at 14:54
  • Well, they're also sometimes used for Finite State Machines The functions might be known, but which function will be taken at runtime could be data-dependent. – MSalters Dec 12 '21 at 14:58
  • 1
    Something like **jump table** in PE-COFF dynamically linked libraries is [Export Directory Table](https://learn.microsoft.com/en-us/windows/win32/debug/pe-format?redirectedfrom=MSDN#the-edata-section-image-only) which position in `library.dll` is defined by a directory record in Optional Header. It tells the Windows dynamic linker what functions does the library export and what are their addresses. – vitsoft Dec 12 '21 at 16:12

1 Answers1

0

Jump table is not necessarily a data table pairing subroutine names with their addresses, as @MSalters pointed out in comments. Often it is control flow within a single subroutine, specifically switch / case statements. Consider Duff's Device, classic example where a jump table can be expected (if not a computed jump without a table):

void send(int *to, int *from, size_t count)
{
    size_t n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

MSVC compiles as follows, jump targets are case labels:

       mov     eax, DWORD PTR $LN17@send[r10+r8*4]
        add     rax, r10
        jmp     rax
$LN10@send:
        mov     eax, DWORD PTR [rdx]
        add     rdx, 4
        mov     DWORD PTR [rcx], eax
$LN11@send:
        mov     eax, DWORD PTR [rdx]
        add     rdx, 4
        mov     DWORD PTR [rcx], eax

https://godbolt.org/z/e64vKzq4d


PE File Format does not define anything related to jump table. There are some pointer tables, such pointer table called import address table for imported functions/data, or TLS callbacks table, but no jump tables.

MSVC happen to place jump tables in the code section, near the using function. It makes jump tables read-only and harder to overwrite.

Whereas there's no required section for jump tables, they might still be annotated somehow. 32-bit x86 would use absolute addresses (rather than RIP-relative), so these jump tables are to make into relocations table as a contiguous pointers ranges, if relocations table is at all present. Not sure about SEH or control flow guard data, it might include jump tables annotation either.

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
  • I see that I misunderstood jump tables to mean something more broad than they do. – J.Todd Dec 12 '21 at 19:47
  • 1
    @J.Todd, did you expect that arbitrary subroutine call would be via a jump table? No, at least on Windows it is usually a direct call (potentially fixed up for 32-bit x86 by relocation table) – Alex Guteniev Dec 12 '21 at 20:00
  • Yes I did. And now I know the difference between pointer tables and jump tables, thanks. – J.Todd Dec 12 '21 at 20:18
  • 2
    @J.Todd: Just for the record, yes on ELF systems like Linux, the GOT mentioned in your question is the pointer-table written by the dynamic linker, used by calls to library functions. (Via PLT stubs, [or directly if you compile with `-fno-plt`](https://stackoverflow.com/questions/65554551/why-gcc-generates-a-plt-when-it-is-apparently-not-needed) like some modern GNU/Linux distros do.) – Peter Cordes Dec 13 '21 at 04:53
  • @PeterCordes, interesting. I assume it is avoided for x86-64 where direct calls are relative anyway. Right? – Alex Guteniev Dec 13 '21 at 07:15
  • @AlexGuteniev: Huh? The relative offset between the text of the executable and the text of a shared library is not constant, and usually greater than 2^31 so even text relocation runtime fixups wouldn't be able to make `call rel32` work. That's why the normal way is `call foo@plt` or `call [RIP + foo@GOTPCREL]` (See also [Can't call C standard library function on 64-bit Linux from assembly (yasm) code](https://stackoverflow.com/a/52131094)) – Peter Cordes Dec 13 '21 at 07:18
  • @PeterCordes Ah, I see. I though PLT is used to by a shared library call its own functions – Alex Guteniev Dec 13 '21 at 07:22
  • 1
    @AlexGuteniev: It is if you're not careful about setting ELF visibility to `hidden` for symbols that shouldn't be externally visible, and making `hidden` aliases for the functions you do export. This is intentional to support symbol interposition (e.g. via `LD_PRELOAD=./my_wrapper_functions.so`), but not every library wants that cost, or that ability for preloads to mess around with stuff. https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/. GCC has an option to make the default visibility `hidden` for global-scope functions, so you `__attribute__` on exports. – Peter Cordes Dec 13 '21 at 08:32