3

I'm trying to write a function that copies a function (and ends up modify its assembly) and returns it. This works fine for one level of indirection, but at two I get a segfault.

Here is a minimum (not)working example:

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

#define BODY_SIZE 100

int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }

int (*g(void))(void) {
    void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy(r, f, BODY_SIZE);
    return r;
}

int (*(*h(void))(void))(void) {
    void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy(r, g, BODY_SIZE);
    return r;
}

int main() {
    printf("%d\n", f());
    printf("%d\n", G()());
    printf("%d\n", g()());
    printf("%d\n", H()()());
    printf("%d\n", h()()()); // This one fails - why?

    return 0;
}

I can memcpy into an mmap'ed area once to create a valid function that can be called (g()()). But if I try to apply it again (h()()()) it segfaults. I have confirmed that it correctly creates the copied version of g, but when I execute that version I get a segfault.

Is there some reason why I can't execute code in one mmap'ed area from another mmap'ed area? From exploratory gdb-ing with x/i checks it seems like I can call down successfully, but when I return the function I came from has been erased and replaced with 0s.

How can I get this behaviour to work? Is it even possible?

BIG EDIT:

Many have asked for my rationale as I am obviously doing an XY problem here. That is true and intentional. You see, a little under a month ago this question was posted on the code golf stack exchange. It also got itself a nice bounty for a C/Assembly solution. I gave some idle thought to the problem and realized that by copying a functions body while stubbing out an address with some unique value I could search its memory for that value and replace it with a valid address, thus allowing me to effectively create lambda functions that take a single pointer as an argument. Using this I could get single currying working, but I need the more general currying. Thus my current partial solution is linked here. This is the full code that exhibits the segfault I am trying to avoid. While this is pretty much the definition of a bad idea, I find it entertaining and would like to know if my approach is viable or not. The only thing I'm missing is ability to run a function created from a function, but I can't get that to work.

LambdaBeta
  • 1,479
  • 1
  • 13
  • 25

3 Answers3

5

The code is using relative calls to invoke mmap and memcpy so the copied code ends up calling an invalid location.

You can invoke them through a pointer, e.g.:

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

#define BODY_SIZE 100

void* (*mmap_ptr)(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset) = mmap;
void* (*memcpy_ptr)(void *dest, const void *src, size_t n) = memcpy;

int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }

int (*g(void))(void) {
    void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy_ptr(r, f, BODY_SIZE);
    return r;
}

int (*(*h(void))(void))(void) {
    void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy_ptr(r, g, BODY_SIZE);
    return r;
}

int main() {
    printf("%d\n", f());
    printf("%d\n", G()());
    printf("%d\n", g()());
    printf("%d\n", H()()());
    printf("%d\n", h()()()); // This one fails - why?

    return 0;
}
Jester
  • 56,577
  • 4
  • 81
  • 125
  • That fails exactly the same way. – Jean-Baptiste Yunès May 04 '18 at 18:03
  • Works fine here. Are you using 32 or 64 bit? – Jester May 04 '18 at 18:04
  • 2
    64 bit may be using rip-relative addressing for data so the same problem persists as with relative calls. – Jester May 04 '18 at 18:05
  • interesting, wasn't aware of this... Need to dig on 64 bit adressing seriously. – Jean-Baptiste Yunès May 04 '18 at 18:09
  • Not accepting because I already tried it, but very good thing to notice either way (the very first thing I realized was that any functions I pass to these copied functions had to be stored in a pointer as they would be relatively jumped to). – LambdaBeta May 04 '18 at 19:08
  • In 64 bit it also works if you can get your compiler to not emit rip relative addressing, for example by using `-mcmodel=large`. Note that I had to increase `BODY_SIZE` to accomodate larger code. – Jester May 04 '18 at 22:34
3

I'm trying to write a function that copies a function

I think that is pragmatically not the right approach, unless you know very well machine code for your platform (and then you would not ask the question). Be aware of position independent code (useful because in general mmap(2) would use ASLR and give some "randomness" in the addresses). BTW, genuine self-modifying machine code (i.e. changing some bytes of some existing valid machine code) is today cache and branch-predictor unfriendly and should be avoided in practice.

I suggest two related approaches (choose one of them).

  • Generate some temporary C file (see also this), e.g. in /tmp/generated.c, then fork a compilation using gcc -Wall -g -O -fPIC /tmp/generated.c -shared -o /tmp/generated.so of it into a plugin, then dlopen(3) (for dynamic loading) that /tmp/generated.so shared object plugin (and probably use dlsym(3) to find function pointers in it...). For more about shared objects, read Drepper's How To Write Shared Libraries paper. Today, you can dlopen many hundreds of thousands of such shared libraries (see my manydl.c example) and C compilers (like recent GCC) are fast enough to compile a few thousand lines of code in a time compatible with interaction (e.g. less than a tenth of second). Generating C code is a widely used practice. In practice you would represent some AST in memory of the generated C code before emitting it.

  • Use some JIT compilation library, such as GCCJIT, or LLVM, or libjit, or asmjit, etc.... which would generate a function in memory, do the required relocations, and give you some pointer to it.

BTW, instead of coding in C, you might consider using some homoiconic language implementation (such as SBCL for Common Lisp, which compiles to machine code at every REPL interaction, or any dynamically contructed S-expr program representation).

The notions of closures and of callbacks are worthwhile to know. Read SICP and perhaps Lisp In Small Pieces (and of course the Dragon Book, for general compiler culture).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Astute of you to notice that I'm effectively trying to make a closure. As per my edit, I actually name the structure that I end up populating in the generated code a closure so yeah - I'm trying to perform JIT recursively. – LambdaBeta May 04 '18 at 19:06
  • Then consider using some JIT compiling library – Basile Starynkevitch May 04 '18 at 20:41
0

this question was posted on code golf.SE

I updated the 8086 16-bit code-golf answer on the sum-of-args currying question to include commented disassembly.

You might be able to use the same idea in 32-bit code with a stack-args calling convention to make a modified copy of a machine code function that tacks on a push imm32. It wouldn't be fixed-size anymore, though, so you'd need to update the function size in the copied machine code.

In normal calling conventions, the first arg is pushed last, so you can't just append another push imm32 before a fixed-size call target / leave / ret trailer. If writing a pure asm answer, you could use an alternate calling convention where args are pushed in the other order. Or you could have a fixed-size intro, then an ever-growing sequence of push imm32 + call / leave / ret.

The currying function itself could use a register-arg calling convention, even if you want the target function to use i386 System V for example (stack args).

You'd definitely want to simplify by not supporting args wider than 32 bit, so no structs by value, and no double. (Of course you could chain multiple calls to the currying function to build up a larger arg.)

Given the way the new code-golf challenge is written, I guess you'd compare the total number of curried args against the number of args the target "input" function takes.


I don't think there's any chance you can make this work in pure C with just memcpy; you have to modify the machine code.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847