0

I am currently wrapping my head around function binding and what binding actually is C++. Function binding is assigning an address to function? Does only refer to when the function is called? or does binding occur when function the function is called?

Here is the program

#include <iostream>

int one() { return 1; }

int main()
{
    // 1. Does binding occur here when the function pointer is assigned the address of the function
    int (*one_ptr)() = one;
    // 2. or does binding occur here when the function is called
    one_ptr();
}

Does binding occur when the function pointer is assigned the address of the function:

int (*one_ptr)() = one;

or does the binding occur when the function is called:

one_ptr();

Here is the relevant objdump of the program:

0000000000001169 <_Z3onev>:
    1169:   f3 0f 1e fa             endbr64
    116d:   55                      push   rbp
    116e:   48 89 e5                mov    rbp,rsp
    1171:   b8 01 00 00 00          mov    eax,0x1
    1176:   5d                      pop    rbp
    1177:   c3                      ret

0000000000001178 <main>:
    1178:   f3 0f 1e fa             endbr64
    117c:   55                      push   rbp
    117d:   48 89 e5                mov    rbp,rsp
    1180:   48 83 ec 10             sub    rsp,0x10
    1184:   48 8d 05 de ff ff ff    lea    rax,[rip+0xffffffffffffffde]        # 1169 <_Z3onev>
    118b:   48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
    118f:   48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
    1193:   ff d0                   call   rax
    1195:   b8 00 00 00 00          mov    eax,0x0
    119a:   c9                      leave
    119b:   c3                      ret

This is the assembly version of the function pointer being declared and initialized

lea    rax,[rip+0xffffffffffffffde]        # 1169 <_Z3onev>
mov    QWORD PTR [rbp-0x8],rax

Here, relative rip addressing is used to assign the address of the function to the local variable. The address of the function is stored in rax as we can see here

lea    rax,[rip+0xffffffffffffffde]        # 1169 <_Z3onev>

So calling rax makes sense. It is an indirect function call (I believe).

call   rax

So, is the function is bound to 00000001169, the address of one()? And in this case, it's static bound because the objdump is able to determine the address of the function could be determined at compile time.

Happy Jerry
  • 164
  • 1
  • 8
  • 1
    I worked with C++ for a long time, never heard of function binding. (except https://en.cppreference.com/w/cpp/utility/functional/bind and that's not close to what you are describing). I have heard of function pointers and using them to make calls This`one_ptr` is such a function pointer (not a binding), and using that to call a function leads to the indirection in your assembly. (https://www.learncpp.com/cpp-tutorial/function-pointers/) – Pepijn Kramer Nov 01 '22 at 14:40
  • 5
    "Binding" is not a thing in C++. "Function binding" (as in late-binding vs. early-binding) is a conceptual notion about how a function call translates into what function actually gets called. – Nicol Bolas Nov 01 '22 at 14:41
  • @NicolBolas Standard does actually use at one or two places the term "binds". For example, [forward](https://timsong-cpp.github.io/cppwp/n4861/forward#6): *"This binds to the constructor `A(const A&)`, which copies the value from `a`."* – Jason Nov 01 '22 at 14:54
  • @PepijnKramer I have seen this usage of the term "bind" in some books(though I have to search if you ask which book). For now, I can think of atleast one usage from the standard: [forward](https://timsong-cpp.github.io/cppwp/n4861/forward#6): *"This binds to the constructor `A(const A&)`, which copies the value from `a`."* – Jason Nov 01 '22 at 14:57
  • @JasonLiam naming things (consistently) is still hard (even for the standard) :) No need for the book references, thanks for sharing the link though. – Pepijn Kramer Nov 01 '22 at 15:02
  • @NicolBolas I hadn't linked "function binding" to "late binding" that I do know. Late biding requires runtime reflection of some kind, and C++ doesn't have that – Pepijn Kramer Nov 01 '22 at 15:03
  • Assigning addresses to functions is usually a linker process or if the compiler has resolved the address, the compiler can resolve the address during compilation. In general, the compiler builds a table of call locations and their functions. The linker will use the table to resolve all symbols after the compilation phase. – Thomas Matthews Nov 01 '22 at 17:10

2 Answers2

3

"Binding" is not really a thing in C++ as a language. When dealing with function calling, the term tends to be used with "early" and "late". Early binding typically refers to compile-time determination of which exactly which code will be executed for a given function call. For C++, this means overload resolution. These are implemented as direct calls: the address to where execution will jump is written directly into the assembly.

Late binding is contrasted with early binding in that you cannot know at compile-time exactly which function will be called purely from the call itself (ie: the name of the function and its arguments). C++ has two features that allow for late binding: virtual functions and function pointers. These are all implemented as indirect calls: you read some memory to determine to where execution will jump.

Using a function pointer is, at least conceptually, a form of late binding. In your exact code example, a decent optimizing compiler can detect that one_ptr only ever assumes one value and therefore optimize out the indirect call.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • I received my definition from learncpp.com lesson "Early and Late Binding". "Binding refers to the process that is used to convert identifiers (such as variable and function names) into addresses." So, in my example, `int one()` is bound to `0x1169` and `one_ptr` is bound to the same address. Though, the `one_ptr` call translates to an indirect call. So, an indirect call is a form of late binding even though the address is assigned at compile time? – Happy Jerry Nov 02 '22 at 18:39
  • @HappyJerry: Is it assigned at compile time? I mean, as I pointed out, in your simple example, the compiler can see exactly what that variable contains at the time of the call. But that is a matter of *implementation*. Whether it is "late binding" or not is not about implementation. It is an indirect call, as function pointers *generally* could point to any function of the appropriate signature. – Nicol Bolas Nov 02 '22 at 19:42
  • @HappyJerry: Similarly, there are usage scenarios where a compiler can "de-virtualize" `virtual` function calls. That doesn't change the fact that they are `virtual` functions. – Nicol Bolas Nov 02 '22 at 19:43
  • "Is it assigned at compile time?" I had assumed it was because of the static objdump `mov QWORD PTR [rbp-0x8], rax`. Though, looking at the rip-relative addressing, `lea rax,[rip+0xffffffffffffffde]`, this may be a temporary address that gets resolved at run time. In that case, it would be late binding, correct? – Happy Jerry Nov 02 '22 at 19:53
  • @HappyJerry: It's *assembly*; it isn't C++ anymore. So the question of early vs. late binding in C++ is *moot* because it is no longer C++. – Nicol Bolas Nov 02 '22 at 20:28
  • Ah okay. So back to the original point, it's considered late binding because it's an indirect call, correct? If so, then I'm still confused on how that related to the address being known at runtime. – Happy Jerry Nov 03 '22 at 18:32
  • @HappyJerry: If it's an indirect call, that means that the function stored in that memory in theory could be a different value from one indirect call to the next. And therefore, it cannot generally be known until runtime. In your *specific* example, the compiler can see exactly what's going on. It can see that you never change the variable. So it could compile it down to a direct call, or even inline the function and not have a call at all. – Nicol Bolas Nov 03 '22 at 19:50
  • @HappyJerry: It should also be noted that "early binding" vs. "late binding" just isn't all that important. I mean, it's important to know when you're using function pointers or virtual functions. But you know when you're using them because neither is the *default* case. You have to ask for them, and you therefore have to have a *reason* to ask for them. You don't say "I need late binding"; you say "I'm building an abstract class interface." – Nicol Bolas Nov 03 '22 at 19:52
1

I would say:

  1. The address of one() is ultimately set at link time.
  2. In your example, the compiler generates code for setting one_ptr to something (let's say "address of one", that the linker would resolve later, and it will appear as a constant address in the final binary). During runtime, one_ptr will be set to the address of one and called.
  3. You could do the same at compile time (get "address of one", later to be resolved by the linker, and assign that to a constant expression). During runtime, we would just see a call to one. [Demo]
rturrado
  • 7,699
  • 6
  • 42
  • 62
  • So, in my original sample code, the value of `rax` during the `call rax` during the static objdump would be different if I were to look at the value of `rax` in the debugger when the programming is being dynamically analyzed? – Happy Jerry Nov 02 '22 at 18:46
  • `rax` value at runtime will depend on the value of `rip`, the instruction pointer, whose value, at the same time, will depend on where your code is loaded into the system's memory by the operating system. The code layout at runtime, however, will be the same as in the objdump. That's why you can say: load `rax` with the effective address of `rip+0xffffffffffffffde` and use that as the address of `one` (i.e., both in the objdump and at runtime, at the point `rax` is set, the relative distance between the value of `rip`, a code address itself, and the address of the function `one`, is known). – rturrado Nov 02 '22 at 22:07
  • I tested and `rip+0xffffffffffffffde` is indeed used during run time. So, this the location of the relative address is determined at run-time, but not the exact address? – Happy Jerry Nov 03 '22 at 02:57
  • That's it. The exact address depends on where the program is loaded into memory by the operating system (let's say, the value of `rip` at `main`). But the distance between `main` and `one` should remain the same as what you see in the objdump. [This answer](https://stackoverflow.com/a/15648680/260313) should give you some more information. – rturrado Nov 03 '22 at 11:55
  • The (SO Post you linked)[https://stackoverflow.com/questions/15648582/does-the-address-of-a-function-change-per-runtime/15648680#15648680] suggests that the late binding is due to ASLR. If that's the case, aren't all functions late binding if the OS employs ASLR? – Happy Jerry Nov 03 '22 at 18:21