Using C++ with assembly to allocate and create new functions at runtime

Question

I've been working on a (C++) project, which requires completely dynamically allocated functions, which means malloc/new and mprotect and then modify the buffer manually to assembly code. Because of this I've wondered exactly, what is required in this "buffer" of mine, for it to be a replicate of any other _cdecl function. For example:

int ImAcDeclFunc(int a, int b)
{
     return a + b;
}

If I would like to literally create a duplicate of this function, but completely dynamically, what would that require (and remember it's C++ with inline assembly)? For starters, I guess I would have to do something like this (or a similiar solution):

// My main....
byte * ImAcDeclFunc = new byte[memory];
mprotect(Align(ImAcDeclFunc), pageSize, PROT_EXEC | PROT_READ | PROT_WRITE);

After this I would have to find out the assembly code for the ImAcDeclFunc(int a, int b);. Now I'm still lousy at assembly, so how would this function be in AT&T syntax? Here's my bold attempt:

push %ebp
movl %%ebp, %%esp
movl 8(%ebp), %%eax
movl 12(%ebp), %%edx
addl edx, eax
pop ebp
ret

Now if this code is correct (which I highly doubt, please correct me) would I only need to find this code's value in hex (for example, 'jmp' is 0xE9 and 'inc' is 0xFE), and use these values directly in C++? If I continue my previous C++ code:

*ImAcDeclFunc = 'hex value for push'; // This is 'push' from the first line
*(uint)(ImAcDeclFunc + 1) = 'address to push'; // This is %ebp from the first line
*(ImAcDeclFunc + 5) = 'hex value for movl' // This is movl from the second line
// and so on...

After I've done this for the whole code/buffer, would that be enough for a completely dynamic _cdecl function (i.e could I just cast it to a function pointer and do int result = ((int (*)(int, int))ImAcDeclFunc)(firstArg, secondArg)?). And I'm not interested in using boost::function or something similiar, I need the function to be completely dynamic, therefore my interest :)

NOTE: This question is a continuation on my previous one, but with far more specifics.

Why would you need to copy a function? The original one is just as good. Do you want to generate a completely new function out of some higher-level representation? — n. m. could be an AI, May 04 '12 at 21:50
@n.m. Yes, this was all just an example for me to understand, and to easily present everything for you. I will easily need about twenty of these. If you read my link (to my other question) you would exactly understand why :) — Elliott Darfink, May 04 '12 at 22:18
I tried to understand that question the first time around, with no success whatsoever. — n. m. could be an AI, May 04 '12 at 22:34
I think you're in for a lot of pain going down this route... you might save yourself the trouble by either using a scripting language instead (like SigTerm recommended) or if you *must* use C/C++ as the language, perhaps have your program write C/C++ source code text to a file, then have your program run g++ (or whatever) to convert that source code into a shared library that it can then dynamically link to. — Jeremy Friesner, May 04 '12 at 23:10

Mike Kwan · Accepted Answer · 2012-05-04T23:29:51.273

6

If you take this lala.c:

int ImAcDeclFunc(int a, int b)
{
    return a + b;
}

int main(void)
{
    return 0;
}

You can compile it with gcc -Wall lala.c -o lala. You can then disassemble the executable with objdump -Dslx lala >> lala.txt. You will find ImAcDeclFunc is assembled to:

00000000004004c4 <ImAcDeclFunc>:
ImAcDeclFunc():
  4004c4:   55                      push   %rbp
  4004c5:   48 89 e5                mov    %rsp,%rbp
  4004c8:   89 7d fc                mov    %edi,-0x4(%rbp)
  4004cb:   89 75 f8                mov    %esi,-0x8(%rbp)
  4004ce:   8b 45 f8                mov    -0x8(%rbp),%eax
  4004d1:   8b 55 fc                mov    -0x4(%rbp),%edx
  4004d4:   8d 04 02                lea    (%rdx,%rax,1),%eax
  4004d7:   c9                      leaveq 
  4004d8:   c3                      retq

Actually this function is relatively easy to copy elsewhere. In this case, you are perfectly correct in saying that you can copy the bytes and it would just work.

Problems will occur when you start to make use of instructions that use relative offsets as part of the opcode. For example, a relative jump or a relative call. In these cases, you need to relocate the instruction properly unless you happen to be able to copy it to the same address as where it was originally.

Briefly, to relocate, you need to find where it was originally based, and calculate the difference to where you are going to base it and relocate each relative instruction with regard to this offset. This in itself is feasible. Your real difficulty is handling calls to other functions, particularly function calls to libraries. In this case you will need to guarantee that the library is linked and then call it in the way defined by the executable format you are targeting. This is highly non-trivial. If you are still interested I can point you in the direction of where you should be reading up on for this.

In your simple case above, you can do this:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>
#include <sys/mman.h>
#include <unistd.h>

int main(void)
{
    char func[] = {0x55, 0x48, 0x89, 0xe5, 0x89, 0x7d, 0xfc,
    0x89, 0x75, 0xf8, 0x8b, 0x45, 0xf8,
    0x8b, 0x55, 0xfc, 0x8d, 0x04, 0x02,
    0xc9, 0xc3};

    int (* func_copy)(int,int) = mmap(NULL, sizeof(func),
        PROT_WRITE | PROT_READ | PROT_EXEC,
        MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

    memcpy(func_copy, func, sizeof(func));
    printf("1 + 2 = %d\n", func_copy(1,2));

    munmap(func_copy, sizeof(func));
    return EXIT_SUCCESS;
}

This works fine on x86-64. It prints:

1 + 2 = 3

edited May 04 '12 at 23:29

answered May 04 '12 at 21:57

Mike Kwan

24,123
12
63
96

2

I would appreciate it _so much_ if you could supply me with a working example. That would be solid gold for me! About the relative calls, from what I know it's just like this; `targetAddress - currentAddress -/+ any offsets`? About the 'handling library calls', would that be any problem, if I will only call member functions? Since I use GCC it's _exactly_ like a cdecl call but with an additional pointer (the 'this' pointer). Or would it create issues if I then call library functions with maybe _stdcall from the member function; i.e dynamic_func->member_func->library_func? – Elliott Darfink May 04 '12 at 22:24
Oh, by the way, doesn't mprotect fail because you don't align the memory? I will try myself :) – Elliott Darfink May 04 '12 at 22:27
@ElliottDarfink: Yep, I just noticed the alignment too. It still segfaults after changing that so will need to play some more. Yes, relative offsets work mostly from taking the delta of targetAddress and currentAddress. – Mike Kwan May 04 '12 at 22:29
This doesn't segfault for me, but the result is zero; i.e 1 + 2 = 0... http://pastebin.com/ELWf33cu – Elliott Darfink May 04 '12 at 22:33
@ElliottDarfink: I just found this page when googling: http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/tools/tests/xnu_quick_test/helpers/data_exec.c It seems to suggest that only some architectures allow execution of stack and heap data areas. I am running x86-64 Linux btw. – Mike Kwan May 04 '12 at 22:36
@ElliottDarfink: After adding `#include ` to the code you came up with, it shows `1 + 2 = 3`. However, the same code returns 269028048 on ideone: http://ideone.com/GNGnO – Mike Kwan May 04 '12 at 22:38
Hmmm... weird. Even though I added 'stdbool' I still have the same output (zero). How come stdbool solved it for you anyhow? About what you previously mentioned, I don't think that the 'execution restriction' will be any problem. I only target Linux, and it doesn't seem [that big of a problem](http://en.wikipedia.org/wiki/Executable_space_protection#Linux). About the code failing, is the assembly perhaps 64bit specific? If you disassembled your own compiled code? I'm running Ubuntu-32bit... – Elliott Darfink May 04 '12 at 22:48
@ElliottDarfink: Ah of course! My code is 64 bit. For the 32 bit version, you'll need to recompile and dump on your machine. Unfortunately I can't use the `-m32` switch on the box I'm SSH'ing into. Also, I just realised we can use `mmap` instead. Like so: http://pastebin.com/ArwyPW0R – Mike Kwan May 04 '12 at 23:00
Haha how cool! Executing functions dynamically in C++. Although I decompiled the "lala.c" to ASM and copied all bytes, I couldn't get it to work with your sample? (the mmap one). Judging by the code, your sample seems much smoother and a far better approach, so I would really prefer that one. Anyway if I use your code (with my ASM code of course) it crashes with segmentation fault at the 'func_copy' function call. This is my code (the code that works is commented out): http://pastebin.com/JxUchcqN. EDIT: The 'mmap' call returns successfully (valid memory), so there must be something else? – Elliott Darfink May 04 '12 at 23:13
@ElliottDarfink: Well that one actually segfaults because you're missing the memcpy ;) Changing that, it now works fine. To get it working on 32 bit, just get the byte array for `-m32`. – Mike Kwan May 04 '12 at 23:20
You sir are absolutely correct! Works flawlessly. This is _exactly_ what I wanted. And way better with mmap than my approach. Thanks so much for your assistance, it's been a great help. I'll guess I'll try to implement these dynamically now with different arguments (count and type) and different returns types. I bet that won't be as fun :( – Elliott Darfink May 04 '12 at 23:26
@ElliottDarfink: Can I just ask what your aim in doing all this is (I have read your other question too)? There is perhaps something out already which does what you want. – Mike Kwan May 04 '12 at 23:27
I'll can try to explain, but it needs a quite extensive explanation (which will be tough to fit in a comment). I create plugins for a game, and old game, which consists of an _engine_ and modifications to this engine. When you create plugins for this engine, you have the SDK for it, but _not_ if you create plugins for the 'modifications' of the engine. Because of this, you have to do reverse-engineering to obtain full access, and reverse-engineering means detours! As mentioned in the other thread, because of the nature of detours, it is impossible to have a _method_ of a _class_......... – Elliott Darfink May 04 '12 at 23:39
as a callback hook. The problem is I love object-oriented C++ and my whole (nearly completed plugin) consist of pure OO-design. Therefore I want to be able to use detours in an object-oriented way, and no matter how hard I thought about it (and others, I've posted several threads...) the only design versatile enough would be one like this! I have no other idea how to implement this else wise, if I want to completely avoid static functions and global variables. If you're interested here's my function header so far http://pastebin.com/bXtvhz3t. As you might notice, I'm in for a hell of a ride :) – Elliott Darfink May 04 '12 at 23:40
@ElliottDarfink: Good luck :) What you're attempting is certainly possible. Doing it in a portable way, perhaps not so much. – Mike Kwan May 04 '12 at 23:42

score 1 · Answer 2 · answered May 04 '12 at 21:45

1

You might want to check out GNU lightning: http://www.gnu.org/software/lightning/. It might help you with what you are trying to do.

answered May 04 '12 at 21:45

Burton Samograd

3,652
19
21

1

Yes, I've read about it but I haven't quite understood how it works. Not to mention how thin the documentation is. You don't know if there is any documentation resources which could offer a helping hand? It seems to be what I want, I just don't know _how_. – Elliott Darfink May 04 '12 at 22:17

score 1 · Answer 3 · answered May 04 '12 at 23:03

1

I think that it'll be better idea to embed some scripting language into your project instead of writing self-modifying program. It'll take less time and you'll gain greater flexibility.

If I would like to literally create a duplicate of this function, but completely dynamically, what would that require (and remember it's C++ with inline assembly)?

It would require human with disassembler. Technically, function should start at one address and end at return statement. However, it is unknown what exactly compiler did with the function during optimization phase. I wouldn't be surprised if function entry point was located in some kind of weird place (like in the end of function, after return statement), or if function were split into multiple parts that were shared with other functions.

answered May 04 '12 at 23:03

SigTerm

26,089
6
66
115

'It would require human with disassembler' - This is incorrect. There are automated tools which perform static analysis which contradicts this (such as Dyninst). – Mike Kwan May 04 '12 at 23:19
@MikeKwan: There's no contradiction, and I am correct. While there are automated tools, they are not 100% reliable, may require human assistance, and they frequently pull helper data out of debug information. SOmething like IDA pro takes minutes to split file into routines, and still can miss several of them. It'll get even more funny if you try to analyze software that was obfuscated to confuse disassembler. – SigTerm May 04 '12 at 23:24
And you believe a human with a disassembler can do better in such cases? Mostly static analysis falls over with indirect branching. In these cases human analysis is not much better. There are more inaccuracies in your answer as well. The size of a function can actually be determined (on ELF at least) with the help of symbol information. – Mike Kwan May 04 '12 at 23:26
@MikeKwan: Obviously. Sufficiently determined person in the end will unavoidably decipher the program or write a replacement routine that will produce same result (i.e. reverse-engineer). It is unavoidable, the only question is how much time it will take. Automatic tool cannot do that. It'll fail to process routine, and situation will not improve with time. – SigTerm May 04 '12 at 23:29
I'm not really sure how accurate that is. In static analysis, the same heuristics a human uses can be applied with a machine, even more effectively with techniques like symbolic execution. If by human with disassembler you actually meant human with disassembler level debugger then that is a different matter all together. – Mike Kwan May 04 '12 at 23:32
@MikeKwan: " (on ELF at least)" Try doing same thing on *.com file. The bottom line is that there are no warranties. Automatic analysis will give you method with non-zero probability of failure, and the whole "with the help of symbols" is not related to C++ - it is platform-specific/compiler-specific trickery. C++ is not very suitable for copying functions. If you need self-modifications, scripting language will be better choice. – SigTerm May 04 '12 at 23:33
@MikeKwan: There are multiple techniques of reverse-engineering, and you're aware of that. Do you want to argue? I'm not in the mood for that, and I won't change my answer. Remember the murphy's law. "If something can go wrong, it will". That's one of possible ideas that could be used as a basis for programming style. And according to murphy's law statistical analysis will fail on the first routine you encounter. Instead of of low-level *exe hacking OP needs flexibility of a scripting language. Writing in-memory compiler will be reinventing the wheel anyway... – SigTerm May 04 '12 at 23:39
@MikeKwan: Post is not tagget "linux", and the only "linuxish" thing there is mprotect. Also, linux supports multiple architectures, which will be a VERY good reason to stay away from any assembly from the start. – SigTerm May 04 '12 at 23:44
He mentions it in the comments to my answer. Anyway, let's agree to disagree. – Mike Kwan May 04 '12 at 23:44
@MikeKwan: As I already said, I'm not in the mood for chatting. Have a nice day. "let's agree to disagree" Ok, it is reasonable enough. – SigTerm May 04 '12 at 23:46
I think using a scripting language would be the way to go, and it was a great suggestion, although I didn't reveal everything for my project. There are 2 reasons I do not want to use a scripting language; 1. The closest thing I know is PHP (or perhaps Pawn) and I'm more interested in going low-level than high-level :) The second reason is why I really want to avoid a scripting language. I'm scripting for a game, and yes, it is old (~10 years), but using a scripting language in a function which is called ~200 times a seconds just doesn't seem healthy, especially compared to inline assembly. – Elliott Darfink May 04 '12 at 23:52
@ElliottDarfink: " I'm scripting for a game" [Lua](http://www.lua.org/). "~200 times a seconds" Once it becomes at least 10000 function calls per second, you can start worrying. Until that happens - there's no reason. – SigTerm May 05 '12 at 00:24

Using C++ with assembly to allocate and create new functions at runtime

3 Answers3