0

In the report I received from Agner Fog's objconv, I see suggested errors to fix in the .plt (procedure linkage table) section, for example:

SECTION .plt    align=16 execute                        ; section number 9, code

?_001:  push    qword [rel ?_086]                       ; 10F0 _ FF. 35, 00201F12(rel)
    jmp     near [rel ?_087]                        ; 10F6 _ FF. 25, 00201F14(rel)

; Filling space: 4H
; Filler type: Multi-byte NOP
;       db 0FH, 1FH, 40H, 00H

ALIGN   8
?_002:  jmp     near [rel ?_088]                        ; 1100 _    FF. 25, 00201F12(rel)

; Note: Immediate operand could be made smaller by sign extension
        push    0                                       ; 1106 _ 68, 00000000
; Note: Immediate operand could be made smaller by sign extension
        jmp     ?_001                                   ; 110B _ E9, FFFFFFE0

In two cases (and others in code not shown) it suggests "Immediate operand could be made smaller by sign extension." How can I access the procedure linkage table to make those changes? Is it possible?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
RTC222
  • 2,025
  • 1
  • 20
  • 53
  • 1
    The PLT is generated automatically by the linker. I don't think you can change how that is done. Also note that the performance impact of this is minimal. – fuz Apr 17 '20 at 19:19

1 Answers1

3

PLT stubs intentionally use longer immediates and jump displacements than necessary so they're a constant size even when you have enough PLT entries that the jmp ?_001 in the fall-through path needs a rel32 to reach from later PLT entries.

They're automatically generated by the linker when linking code that used call printf wrt ..plt, or when linking a non-PIE that just used call printf.

You can avoid the PLT entirely by writing call [rel printf wrt ..got], like GCC does when you compile with -fno-plt. This does early binding (instead of lazy), resolving all the GOT at startup before your _start. See Can't call C standard library function on 64-bit Linux from assembly (yasm) code. Using default rel lets you leave out the explicit rel part of the addressing mode. The equivalent AT&T syntax is call *printf@GOTPCREL(%rip)


I don't know if this fixed-width array of PLT stubs is strictly necessary for anything at run time. e.g. lazy dynamic linking only modifies the GOT, not the PLT itself, because modern PLTs use an indirect jump. The push 0 is pushing an index of the PLT entry, but I don't think anything uses it to actually find the address of the machine code of that PLT stub, only indexing a GOT entry.

At this point it might just be a missed optimization in the linker. NASM isn't generating it so you can't really do anything about it.

I seem to recall historically seeing a jmp rel32 as the first instruction of PLT stubs in 32-bit code, not a jmp [mem], but maybe that was just a guess at how PLT stubs worked before I really knew much. If they ever worked that way, lazy dynamic linking would modify the actual PLT itself to fix up the relative jump target, so indexing the machine code of the PLT entry would be important. (And thus having every entry be fixed width would be important).

But even 32-bit code doesn't use jmp rel32 these days so the PLT stubs are read-only. And in 64-bit code, jmp rel32 can only reach +-2GiB so wouldn't be usable to reach libraries mapped to a random address.


Note that those longer-than-needed instructions only ever run once for each PLT stub. After the first call, the indirect jmp target will be the function in the library. (On the first call, the jmp target will be the next instruction after the jmp.)

The padding might possibly be a good thing: too many jmp instructions in a single 16-byte block of code is bad for branch predictors on some CPUs. But I think the limit is like 3 or 4 jumps in a 16-byte block of machine code for some AMD or Core 2, so that wouldn't be hit anyway with 6-byte jmp [RIP+rel32] + 2-byte push imm8 + 2-byte jmp rel8.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks for the long explanation of how plt tables work. I think the best would be to use call [rel printf wrt ..got] and do away with the .plt. I suspect that would improve performance because we don't have to jump to the plt. I will try that and re-objconv to see. – RTC222 Apr 17 '20 at 19:52
  • 1
    @RTC222: yes, more distros are starting to use `-fno-plt` as a standard build option to remove a level of indirection on every call. It can slightly slow down running `big_program --help` because lots of dynamic linking has to be done for library functions that aren't called, but low downside other than that. See [32-bit absolute addresses no longer allowed in x86-64 Linux?](https://stackoverflow.com/q/43367427) - I included a section about `-fno-plt` there, with a link about performance. – Peter Cordes Apr 17 '20 at 20:02