15

My questions:

  1. Is function pointer equality guaranteed by the C standard?
  2. If the answer of (1) is yes. Is that the case regardless of the pointer being obtained in different final compilation units (e.g. main executable and a shared library)?
  3. How does the dynamic loader deal with that? (I can think about a few reasons for which this might be tricky, all related to PIC code (e.g. GOT tables in elf and whatever equivalent COFF uses for that)). Regardless of (1) and (2) the linux loader seems to guarantee this.

Here is an example. The questions above boil down to whether C guarantees what main.c prints: "Function equality: 1" or "Function equality: 0" and, in the first case, how does the dynamic loader make that happen.

common.h:

extern void * getc_main;
extern void * getc_shared;
void assign_getc_shared(); 

main.c:

#include <stdio.h>
#include "common.h"

int main()
{
  getc_main = (void*) getc;
  assign_getc_shared();
  printf("Function equality: %d\n", getc_main == getc_shared);
  return 0;
}

shared.c:

#include <stdio.h>
#include "common.h"

void assign_getc_shared()
{
   getc_shared = (void*) getc;
}

In Unix this would be compiled with the following commands:

cc -shared -fPIC -o libshared.so shared.c
cc -o main main.c -L. -lshared

And executed with:

LD_LIBRARY_PATH=. ./main
fons
  • 4,905
  • 4
  • 29
  • 49
  • That's a rather long way to ask "is it guaranteed that standard library functions only get included once in the executable" – Mr Lister Feb 20 '13 at 16:58
  • And I think the answer to Mr Lister's question is "No, it is not guaranteed". Functions may be inlined, for example - and if you take the address of an inline function, it will be included as a "real" function in the code, which means that there will potentially be multiple functions for the same source function. – Mats Petersson Feb 20 '13 at 17:04
  • @MrLister If I was interested to know only that, then I would had only asked that. The reason for asking extra questions is because I am interested in knowing the details on how the dynamic loader deals with this problem. From your comment I guess you are not, and that's fine. – fons Feb 20 '13 at 17:52

1 Answers1

14

C 2011 (N1570 Committee Draft) 6.5.9 6: “Two pointers compare equal if and only if … both are pointers to the same … function …. So, yes, two pointers to the same function compare equal.

When the address of a function is taken in two different object modules, the compiler puts a placeholder in the object code. That placeholder is filled in when the object modules are linked into an executable or linked with a dynamic library at run-time.

For dynamic libraries, either the dynamic loader fills in all placeholders in the executable as necessary or the address of each function is actually the location of some stub code that jumps to the actual function, and a placeholder in or used by that stub code is filled in by the dynamic loader.

Additionally, note that an executable can contain more than one instance of a function. The compiler might insert the function inline in several places or might, for reasons of its own, include a specialization of the function as well as a general version. However, when the address of the function is taken, the compiler must provide the address of a single general version. (Or the compiler must ensure the program behaves as if that were done. E.g., if the compiler can detect that the program does not compare pointers, then it might, in theory, be able to use a different address for some instances of the address of the function.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • This means then that it is impossible to implement a C 2011 conformant C compiler on 80x86 real mode. As any point in memory can be accessed via 4096 different (far/huge) pointers. – Patrick Schlüter Feb 20 '13 at 17:10
  • 4
    @tristopia: I do not see how your conclusion follows. The fact that each address has 4096 potential representations does not prevent the compiler from ensuring that different representations of the same address compare equal. The compiler is not required to implement `a == b` with a single instruction; it is free to perform arithmetic to convert each of `a` and `b` from whatever format they are in to a complete unique address and then to compare the resulting complete addresses. – Eric Postpischil Feb 20 '13 at 17:12
  • 3
    Additionally, the compiler has control over the taking of addresses, so it may ensure that a particular representation of a function address is used and that others are not. – Eric Postpischil Feb 20 '13 at 17:14
  • Yes, indeed. It may implement what we did by hand at that time with the `FP_SEG`, `FP_OFF` macros. – Patrick Schlüter Feb 20 '13 at 17:26
  • 1
    @EricPostpischil Regarding the relocations (*placeholders* as refer to them in your answer) I can see a problem with what value to fill in for every compilation unit when using PIC, at least in ELF (that's why I asked about the dynamic loader). That is because each compilation unit has its own PLT slots and GOT table. Which means, the dynamic loader, in order to ensure pointer equality, should make sure to fill-in the same value in the GOT tables of all the processes. But what value to put? And how to ensure that calling the PLT slot of another compilation unit doesn't cause problems? – fons Feb 20 '13 at 17:41
  • @EricPostpischil Can you inline a function for which you export its symbol? I really don't think so, otherwise LD_PRELOAD wouldn't work in unix. – fons Feb 20 '13 at 17:43
  • @fons: Did you mean to say “of all the processes”? The C standard only guarantees that pointers to the same function compare equal within one execution of one program. They are not necessarily equal within different processes or even different executions of the same program. – Eric Postpischil Feb 20 '13 at 17:56
  • @EricPostpischil No, no, of course not :) I meant just one process. – fons Feb 20 '13 at 17:57
  • Certainly you (or the compiler) can inline a function for which you export the symbol. There would be both an inline version and a regular stand-alone version, and the exported symbol would be that of the regular version. – Eric Postpischil Feb 20 '13 at 18:00
  • @fons: I am not familiar with the details of how Linux manages dynamic loading. However, I expect that, when the dynamic loader loads a module, it then knows where each function in the module has been placed, and it stores the function’s address everywhere that it is referenced, including in the GOT of each module that refers to the function. So every GOT contains the same address for the function. – Eric Postpischil Feb 20 '13 at 18:19
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/24844/discussion-between-fons-and-eric-postpischil) – fons Feb 20 '13 at 18:23