My examples below are for Linux x86_64
with gcc
, but similar considerations should apply on other systems.
Can we let a function body live on heap?
Yes, absolutely we can. But usually that is called JIT (Just-in-time) compilation. See this for basic idea.
Because we can manipulate memory on heap more freely, we may gain more freedom to manipulate functions.
Exactly, that's why higher level languages like JavaScript have JIT compilers.
In the following C code, I copy the text of hello function onto heap and then point a function pointer to it. The program compiles fine by gcc but gives "Segmentation fault" when running.
Actually you have multiple "Segmentation fault"
s in that code.
The first one comes from this line:
int size = 10000; // large enough to contain hello()
If you see x86_64
machine code generated by gcc
of your
hello
function, it compiles down to mere 17 bytes:
0000000000400626 <hello>:
400626: 55 push %rbp
400627: 48 89 e5 mov %rsp,%rbp
40062a: bf 98 07 40 00 mov $0x400798,%edi
40062f: e8 9c fe ff ff call 4004d0 <puts@plt>
400634: 90 nop
400635: 5d pop %rbp
400636: c3 retq
So, when you are trying to copy 10,000 bytes, you run into a memory
that does not exist and get "Segmentation fault"
.
Secondly, you allocate memory with malloc
, which gives you a slice of
memory that is protected by CPU against execution on Linux x86_64
, so
this would give you another "Segmentation fault"
.
Under the hood malloc
uses system calls like brk
, sbrk
, and mmap
to allocate memory. What you need to do is allocate executable memory using mmap
system call with PROT_EXEC
protection.
Thirdly, when gcc
compiles your hello
function, you don't really know what optimisations it will use and what the resulting machine code looks like.
For example, if you see line 4 of the compiled hello
function
40062f: e8 9c fe ff ff call 4004d0 <puts@plt>
gcc
optimised it to use puts
function instead of printf
, but that is
not even the main problem.
On x86
architectures you normally call functions using call
assembly
mnemonic, however, it is not a single instruction, there are actually many different machine instructions that call
can compile to, see Intel manual page Vol. 2A 3-123, for reference.
In you case the compiler has chosen to use relative addressing for the call
assembly instruction.
You can see that, because your call
instruction has e8
opcode:
E8 - Call near, relative, displacement relative to next instruction. 32-bit displacement sign extended to 64-bits in 64-bit mode.
Which basically means that instruction pointer will jump the relative amount of bytes from the current instruction pointer.
Now, when you relocate your code with memcpy
to the heap, you simply copy that relative call
which will now jump the instruction pointer relative from where you copied your code to into the heap, and that memory will most likely not exist and you will get another "Segmentation fault"
.
If my program can not be repaired, could you provide a way to let a function live on heap? Thanks!
Below is a working code, here is what I do:
- Execute,
printf
once to make sure gcc
includes it in our binary.
- Copy the correct size of bytes to heap, in order to not access memory that does not exist.
- Allocate executable memory with
mmap
and PROT_EXEC
option.
- Pass
printf
function as argument to our heap_function
to make sure
that gcc
uses absolute jumps for call
instruction.
Here is a working code:
#include "stdio.h"
#include "string.h"
#include <stdint.h>
#include <sys/mman.h>
typedef int (*printf_t)(char* format, char* string);
typedef int (*heap_function_t)(printf_t myprintf, char* str, int a, int b);
int heap_function(printf_t myprintf, char* str, int a, int b) {
myprintf("%s", str);
return a + b;
}
int heap_function_end() {
return 0;
}
int main(void) {
// By printing something here, `gcc` will include `printf`
// function at some address (`0x4004d0` in my case) in our binary,
// with `printf_t` two argument signature.
printf("%s", "Just including printf in binary\n");
// Allocate the correct size of
// executable `PROT_EXEC` memory.
size_t size = (size_t) ((intptr_t) heap_function_end - (intptr_t) heap_function);
char* buffer = (char*) mmap(0, (size_t) size,
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
memcpy(buffer, (char*)heap_function, size);
// Call our function
heap_function_t fp = (heap_function_t) buffer;
int res = fp((void*) printf, "Hello world, from heap!\n", 1, 2);
printf("a + b = %i\n", res);
}
Save in main.c
and run with:
gcc -o main main.c && ./main