Why does linkage affect whether relocations are needed for relative jumps in the same section?

Question

In this simple program, I get a relocation in main for compute, but not for compute2:

static int compute2()
{
    return 2;
}

int compute()
{
    return 1;
}


int main()
{
    return compute() + compute2();
}

I compile this with gcc -c main.cpp using gcc 11.2.0 on Ubuntu 21.10.

Here's what objdump says about main:

000000000000001e <main>:
  1e:   f3 0f 1e fa             endbr64 
  22:   55                      push   rbp
  23:   48 89 e5                mov    rbp,rsp
  26:   53                      push   rbx
  27:   e8 00 00 00 00          call   2c <main+0xe>    28: R_X86_64_PLT32  compute()-0x4
  2c:   89 c3                   mov    ebx,eax
  2e:   e8 cd ff ff ff          call   0 <compute2()>
  33:   01 d8                   add    eax,ebx
  35:   48 8b 5d f8             mov    rbx,QWORD PTR [rbp-0x8]
  39:   c9                      leave  
  3a:   c3                      ret

As you can see, for the call to compute2 (internal linkage) there is a relative jump with no relocation. But for the call to compute (external linkage) there is a relocation, even if all three functions are in the same section in the same object file.

Why is that relocation needed? I thought the linker would never split up a section, so no matter where this section gets loaded, relative addresses should still be the same? Why does linkage seemingly affect this?

[This](https://stackoverflow.com/questions/68832394/static-function-vs-non-static-in-static-linking) and [this](https://stackoverflow.com/questions/68824579/why-calling-to-local-functions-need-relocation) are related. — Eric Postpischil, Sep 04 '21 at 11:17

Peter Cordes · Answer 1 · 2021-09-03T18:49:26.470

It's not that a relocation is needed per se, it's that the compiler chooses to do indirection through the PLT (because of possible symbol interposition, or in case the main executable or an earlier shared lib define the symbol). Note the relocation type R_X86_64_PLT32.

If you look at the compiler's asm output (not disassembly of the .o), you'd see call compute@plt.

A static function definitely always uses the definition in the same translation unit, but other definitions of global symbols can take precedence.

This should only be happening for -fPIC, not for building the main executable itself (-fPIE is on by default in most modern distros), for symbols defined in the same .c (translation unit).

https://godbolt.org/z/qYYWsYf6a shows GCC -fPIE still using call compute. Apparently Ubuntu enables some other options that make this different? (Godbolt's gcc doesn't enable-by-default several things that most distros do, so you need some options to match how GCC is configured on Ubuntu. -fstack-protector-strong isn't relevant, and IDK what else would be.)

Note that when linking an executable (not a shared lib), the call should get "relaxed" to a direct call that doesn't go through the PLT. So it's ok for GCC to emit all calls as call foo@plt.

If you were using -fno-plt as well, calls would be emitted as call *foo@gotplt(%rip), which takes 6 bytes, so relaxing it to a direct 5-byte call rel32 needs a byte of filler; ld uses a meaningless address-size prefix. (See my answer on Can't call C standard library function on 64-bit Linux from assembly (yasm) code for an example.)

If you don't want this PLT indirection in the first place, you can set ELF visibility = hidden for that symbol. This is a really good idea when making a shared library, since in that case the linker won't be able to relax all the indirection through the PLT for internal calls to functions you don't intend to allow symbol-interposition for.

You can use -fvisibility=hidden to make that the default for all prototypes, so calls will use call rel32, not indirect through the PLT (or GOT with -fno-plt). Then for any function or variable a shared library does want to export, use __attribute__((visibility("default")))

For your case, -fvisibility=hidden may solve the problem you're having, with GCC unnecessarily indirecting even though you're not building code that can go into a shared library (with -fPIC).

See also

https://gcc.gnu.org/wiki/Visibility
How to use the __attribute__((visibility("default")))?
https://unix.stackexchange.com/questions/472660/what-are-difference-between-the-elf-symbol-visibility-levels
Can't call C standard library function on 64-bit Linux from assembly (yasm) code (link-time relaxation example when linking an executable rather than a shared lib.)
Sorry state of dynamic libraries on Linux on Thiago Macieira's blog, from 2012. (Before -fno-plt existed; so at least that idea has been implemented, and is now the default for some distros binary packages, like Arch GNU/Linux.)

Daniel Kleinstein · Accepted Answer · 2021-09-03T12:28:25.373

2

I believe this behavior is implemented to enable symbol interposition – by exposing the compute call as a relocatable opcode, you can run your code like

> LD_PRELOAD=custom_compute.so ./main

and your compute call will be relocated to a custom compute function defined in the .so.

This functionality is disabled for static functions like compute2 - which are internally linked and shouldn't be available for symbol interposition.

As mentioned in comments, this behavior is not just for LD_PRELOAD but is more generally relevant for shared libraries - for instance, in this example, if two shared libraries were to be loaded, both defining compute - the second library's call to compute would be relocated to the first library's function.

edited Sep 03 '21 at 12:28

answered Sep 03 '21 at 10:26

Daniel Kleinstein

5,262
1
22
39

Right, that sounds probable indeed! Then this relocation will be kept through the linking stage as a load time relocation which can be overridden by the dynamic linker if you use `LD_PRELOAD`. – knatten Sep 03 '21 at 10:31
On what do you base this belief? Have you tested it? – Eric Postpischil Sep 03 '21 at 10:34
@EricPostpischil Not sure how to test this - but in one of the duplicates you posted (https://stackoverflow.com/questions/68832394/static-function-vs-non-static-in-static-linking), the only (non-accepted) answer mentions an interesting observation - the answerer was able to replicate the behavior in Linux but not in Windows. This would strengthen the theory, given that `LD_PRELOAD` is a non-Windows mechanism (and Windows' hotpatching works entirely differently from how `LD_PRELOAD` works). – Daniel Kleinstein Sep 03 '21 at 10:40
@DanielKleinstein: You would test it by building a `.so` with one implementation of `compute`, say one that prints “library version”, building a program with another implementation of `compute`, say one that prints “program version”, and running the program with the command you show. I would do it, but I do not have GCC or a Linux system handy. – Eric Postpischil Sep 03 '21 at 10:58
@EricPostpischil Barring bugs in my Linux distro, I'm sure the test you describe will work because it's just testing if `LD_PRELOAD` works. What I'm not sure how to test is that this is the explicit rationale for the relocatable opcode, for completeness we'd probably need to delve into GCC source code.. but as I mentioned - I think the answer in the question you linked provides fairly strong evidence that this is the rationale. – Daniel Kleinstein Sep 03 '21 at 11:02
Re “I'm sure”: Have you tested it? – Eric Postpischil Sep 03 '21 at 11:03
1

I feel no more need to run that test than I feel the need to compile `printf("Hello, world");` and see that it really prints "Hello, world" to console. i.e. if it doesn't work then I have much bigger issues on my machine :) – Daniel Kleinstein Sep 03 '21 at 11:04
2

The term for this mechanism is “symbol interposition” and it's not just there for `LD_PRELOAD` but also for shared libraries in general. – fuz Sep 03 '21 at 11:16
1

Good point fuz. Imagine that this code goes in a static library. Then someone links against that static library, but also against a dynamic library which provides the same symbol. I believe if you link to the shared library first and then the static library, the one in the dynamic one is actually supposed to be called. – knatten Sep 03 '21 at 11:19
1

@DanielKleinstein If multiple shared libraries provide the same symbol, there's an order that decides which library's symbol is used. This symbol is then also used for other libraries using the symbol. This is to emulate the effects of static linking with respect to linker operand order. – fuz Sep 03 '21 at 11:20
@fuz Thanks! I expanded my answer for completeness. – Daniel Kleinstein Sep 03 '21 at 11:23
As a side note - this is consistent with the observed behavior that the relocatable opcode isn't replicated in Windows, because Windows linking works entirely differently and only explicitly marked functions are exported from DLLs. – Daniel Kleinstein Sep 03 '21 at 11:28
I tested this on Ubuntu 20.04 and it didn't work. I get the value from the function defined in the main.cpp file, with or without LD_PRELOAD in effect. (I had to add `-fPIC` to get it to call `compute()` through the PLT, and it still didn't get the interposed version.) – Nate Eldredge Sep 03 '21 at 14:52
@NateEldredge This only effects shared libraries, not the main program. Functions defined in the main program cannot be overridden like this. – fuz Sep 03 '21 at 15:04
1

@NateEldredge You have to move the `compute` function from the code to a separate library and add that library to the compilation of OP's code. What would happen without `LD_PRELOAD` is that the compute function would be linked from that shared library - with `LD_PRELOAD` it's linked from the injected library - the important point (with regards to this question) being that it's done with the relocatable opcode. – Daniel Kleinstein Sep 03 '21 at 15:05
Per Nate Eldrege’s comment above, your claim has been tested and been found to be false. This answer is wrong. Refusing to test code is bad engineering. – Eric Postpischil Sep 03 '21 at 23:30
1

@EricPostpischil Both fuz and me explained the change Nate had to do - as I explained, the test you proposed is not a relevant test at all but just a sanity check that `LD_PRELOAD` works as documented. The answer is not wrong, but downvote as you please. – Daniel Kleinstein Sep 04 '21 at 05:02

Why does linkage affect whether relocations are needed for relative jumps in the same section?

2 Answers2

Linked