1

Suppose we have some C code that calls upon a function though a function pointer, whether it be through a function pointer table or a function pointer passed as parameter or other, like so:

/* ... some other code .. */
void (*f)(void) = something; // f function pointer to some function
(*f)();

This should be compiled to (or something equivalent)

mov  %rcx, [something] ; here f=ecx
callq *%rcx

Question: does %ecx always point to a function prologue or can it point to a small peice of code at the end on a function?

Example:

void big_func(){
    /* lots of code here */
    printf("bar");
    printf("foo");
}
void small_func(){
    printf("foo");

With big_func C compiled to

; some more code up here
1: 48 8d 3d c4 0e 00 00    lea    %rdi,[0xec4+%rip]  ;ptr to "bar"
2: b8 00 00 00 00          mov    %eax,$0x0
3: e8 e6 fe ff ff          callq  1030 <printf@plt>
4: 48 8d 3d b7 0e 00 00    lea    %rdi,[0xeb7+%rip]  ;ptr to "foo"
5: b8 00 00 00 00          mov    %eax,$0x0
6: e8 d5 fe ff ff          callq  1030 <printf@plt>
7: b8 00 00 00 00          mov    %eax,$0x0
8: 5d                      pop    %rbp
9: c3                      ret

Is it possible for a call to small_func to point to 4: as it's entry point? Does this happen ever (with a generic compiler like gcc) or only with some human modifying the assembly code behind the scenes?

Question limits:

  • Consider all assembly code code compiled by a non-specialised compiler (like gcc, clang etc., without fancy asm quirks for performance behind the scenes)
  • Only consider normal behaviour. Let's imagine the developer is a nice person and doesn't implement any pesky undefined behaviour

Additional mini-question: What happens would happen of one intentionally modifies the function pointer to skip some bytes from a function prologue? Is this considered undefined behaviour?
Example:

void (*f)(void) = something; // f function pointer to some function
f=(void (*func_ptr)(void)) ((*char)f+2)
(*f)(); //skips `push ebp`

EDIT: I should have been more clear as of why this question is asked. It is in the context of a research master, seeking a new way to mitigate ROP based attacks at a very low software or hardware level. If it were possible for indirect calls to point somewhere else than a function prologue it could break one of our tag-based implementation (missing the tag and terminating the program after incorrectly detecting an attack)

TheD0ubleT
  • 121
  • 5
  • i be say your question is unclear. to what point *something* you ask ? what sense in this question. what happen, if skip some bytes from a function prologue - all can be, from nothing (if here nop;nop; or mov eax,eax) to crash (in most case) – RbMm May 13 '21 at 12:56
  • 3
    In your example it can't point to `4:` because then the `pop %rbp` at the end would not work and you would also have a misaligned stack for the `printf`. Given the right circumstances it is possible to write such functions by hand. – Jester May 13 '21 at 13:06
  • 1
    Not all functions even have a prologue (the first instruction belongs to the body), is that an important part of the question or are you using "prologue" more like "the start of the function"? – harold May 13 '21 at 15:44
  • I am using prologue in the general sense "begining of the function", my bad if that was not sufficiently clear – TheD0ubleT May 13 '21 at 15:46
  • If you defined `big_func() { ...; small_func(); }`, it could compile to a tailcall of the other function. Or even better, in theory execution could just fall into it without a `jmp` if the compiler put big_func right before small_func, but I don't think real compilers look for that optimization. e.g. [print a newline using tail call optimization](https://stackoverflow.com/a/67405215) shows doing that by hand, where `print_newline` is just `print_char('\n')`. – Peter Cordes May 13 '21 at 15:54

2 Answers2

2

The function pointer will always reference the beginning of the function. C Standard does not allow casting from other pointer types. It invokes an Undefined Behaviour.

But the particular implementation may generate the correct code especially if the function does not set the stack frame. But in most cases it will fail

But it makes no sense at all - you simple should split the @big@ function into two smaller ones and call them when needed without pseudo tricks.

0___________
  • 60,014
  • 4
  • 34
  • 74
  • 1
    Or have `big` tail-call `small_func` instead of manually inlining. (As I commented elsewhere, in theory a compiler could make asm that just falls into small_func without even a JMP, but in practice they don't.) – Peter Cordes May 13 '21 at 16:08
1

In C, like most languages, functions have a very specific meaning, purpose and implementation.  In invocation, a function pointer has to work just like a function.  We cannot invoke code that isn't a function, whether direct call or function pointer.  The function generally doesn't know whether it was invoked directly or by pointer — it must accept the arguments passed, perform its function and return to the caller (usually), no matter how it was called (directly or by function pointer).  In C, functions have a single entry point.

There is no concept in C of a snippet of code that isn't a function, that can be transferred to (or invoked) by function pointer.  (C has labels and goto's, but there are no label pointers, or label variables or label variable goto's, for example.)

Not all functions require prologue, they only have to be able to accept their arguments (do something) and return to the caller — that doesn't necessarily require prologue or epilogue, but they are still functions (with a single entry point).

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
  • Related: GNU C does let you take the address of a label, for use with `goto` *within* a function. But it's definitely not a function pointer, and it would be UB (and very likely to break in practice) to cast a label pointer to a function pointer and deref (call) it. https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html. And using it with goto *must* be from within the same function. [An SO answer has an example of the asm](https://stackoverflow.com/a/17796727). – Peter Cordes May 13 '21 at 15:57
  • @PeterCordes, thanks for pointing that out. That's non-standard, right? – Erik Eidt May 13 '21 at 15:57
  • yeah, it's a GNU C extension. So standard, but not ISO standard. – Peter Cordes May 13 '21 at 16:01
  • In theory a C compiler could create one block of code that had multiple function entry-points in it. (And you can do that in asm). But you're right that it wouldn't be visible at the language level. e.g. [print a newline using tail call optimization](https://stackoverflow.com/a/67405215) shows doing that by hand, where `print_newline` is just `print_char('\n')`, and you can optimize the tailcall to just fall into the other function without even a `jmp`. But in practice compilers like GCC and clang will at best `jmp`, not fall through, I think. – Peter Cordes May 13 '21 at 16:02
  • Also related: [Does a function with instructions before the entry-point label cause problems for anything (linking)?](https://stackoverflow.com/q/35468465) - the entry point for a function doesn't have to be the lowest-address instruction. I think that's safe if you pretend the preceding code is a different function with its own `.size`. Compilers don't make asm like that, and it's certainly not something you can take advantage of from pure C. – Peter Cordes May 13 '21 at 16:06