A fun academic problem I am trying to solve:
In C code, I am trying to dynamically rebind symbols at runtime, much like Facebook's fishhook repo which rebinds function symbols. I mainly care about, going after symbols referenced in the __DATA.__la_symbol_ptr
section of a Mach-O executable. With the fishhook implementation, you provide your new function to replace the original one, a string indicating which function you want to replace, as well as a global function pointer which will take the place to call the original, replaced function.
For example, taken from the README in the fishhook repo...
static int (*orig_close)(int);
int my_close(int fd) {
return orig_close(fd);
}
... then in main
rebind_symbols((struct rebinding[1]){{"close", my_close, (void *)&orig_close}}, 1);
This is awesome, but I want be able to completely switch all calls to my_close
with all calls to close
and vice versa in my module. For example, instead of a global function pointer that points to the original close
, I'd want my implementation to look like this:
int my_close(int fd) {
return my_close(fd);
}
Unfortunately, since this symbol is referenced in the same module, this symbol will get called via a direct call instead of a symbol stub. Here's the assembly when calling this function from main
0x100001e00 <+0>: push rbp
0x100001e01 <+1>: mov rbp, rsp
0x100001e04 <+4>: sub rsp, 0x20
0x100001e08 <+8>: xor eax, eax
0x100001e0a <+10>: mov dword ptr [rbp - 0x4], 0x0
0x100001e11 <+17>: mov dword ptr [rbp - 0x8], edi
0x100001e14 <+20>: mov qword ptr [rbp - 0x10], rsi
0x100001e18 <+24>: mov edi, eax
0x100001e1a <+26>: call 0x100001da0 ; my_close at main.m:42
0x100001e1f <+31>: xor edi, edi
0x100001e21 <+33>: mov dword ptr [rbp - 0x14], eax
0x100001e24 <+36>: mov eax, edi
0x100001e26 <+38>: add rsp, 0x20
0x100001e2a <+42>: pop rbp
0x100001e2b <+43>: ret
Ok, easy enough fix, I can use an assembler directive to mark the function as weak and use a weakref to shut the compiler up about a potential stack overflow. Changing my_close
to:
static int f(int) __attribute__ ((weakref ("my_close")));
__attribute__((weak))
int my_close(int fd) {
return f(fd);
}
Will then produce the following assembly in main
:
0x100001df0 <+0>: push rbp
0x100001df1 <+1>: mov rbp, rsp
0x100001df4 <+4>: sub rsp, 0x20
0x100001df8 <+8>: xor eax, eax
0x100001dfa <+10>: mov dword ptr [rbp - 0x4], 0x0
0x100001e01 <+17>: mov dword ptr [rbp - 0x8], edi
0x100001e04 <+20>: mov qword ptr [rbp - 0x10], rsi
0x100001e08 <+24>: mov edi, eax
0x100001e0a <+26>: call 0x100001e5e ; symbol stub for: my_close
0x100001e0f <+31>: xor edi, edi
0x100001e11 <+33>: mov dword ptr [rbp - 0x14], eax
0x100001e14 <+36>: mov eax, edi
0x100001e16 <+38>: add rsp, 0x20
0x100001e1a <+42>: pop rbp
0x100001e1b <+43>: ret
So here's the part I am stuck on: when referencing my_close
inside my_close
, it always results in a direct call. For example: here's the assembly for my_close
0x100001dd0 <+0>: push rbp
0x100001dd1 <+1>: mov rbp, rsp
0x100001dd4 <+4>: sub rsp, 0x10
0x100001dd8 <+8>: mov dword ptr [rbp - 0x4], edi
0x100001ddb <+11>: mov edi, dword ptr [rbp - 0x4]
0x100001dde <+14>: call 0x100001dd0 ; <+0> at main.m:44
0x100001de3 <+19>: add rsp, 0x10
0x100001de7 <+23>: pop rbp
0x100001de8 <+24>: ret
Is there any assembler directives I can use (that I've missed) to tell my_close
to be treated as a stub when being called inside my_close
? Yeah, I know I can use dlsym
to get the original, but I am being stubborn :]