Segmentation fault when trying to fprintf after executing machine code with jit

Question

devs! Could you help me? The project's goal is to translate byte code from a fictional architecture, generating an array of real machine code and make it run with jit, but I get a segmentation fault when I try to save a certain part of the output on a file. Part of the code responsible for this:

uint32_t length = sysconf(4096);
void * memory = mmap(0 , length , PROT_NONE , MAP_PRIVATE | MAP_ANONYMOUS , -1 , 0);

//{machine array receives the translated machine code here...}

mprotect ( memory , length , PROT_WRITE ) ;
// copying the machine code array to the memory
memcpy ( memory , ( void *) ( machine ) , sizeof ( machine ) ) ;
mprotect ( memory , length , PROT_EXEC ) ;
uint32_t length = sysconf(4096);


const uint32_t (* jit ) (int32_t*, uint8_t*) = ( uint32_t (*) (int32_t*, uint8_t*) ) ( memory );

// running the machine code to produce de outputs
// &R is the array of registers to store the output and &mem contains the original byte 
// code to receive inputs from a instruction that changes the original code

(*jit)((int *)&R, (unsigned char *)&mem);

munmap(memory,length);

// printf/fprintf that causes the segmentation fault if we try to print n and ic[n]
// n = 0;   - does not work to print the correct starting value for n
// fflush(stdout); - works to print the correct starting value for n
for(n = 0; n < 16; n++) {
    // fprintf(output,"%02x:\n",n);
    // fprintf(output,":%d\n",ic[n]);
    fprintf(output,"%02x:%d\n",n,ic[n]);      
    // printf("%02x:%d\n",n,ic[n]);
    // fflush(stdout);
}

for (k = 0; k < 16; k++) {
    fprintf(output,"R[%d]=0x%08x\n",k,R[k]);
}

The original byte code translated to instructions on this pseudo-assembly code. On this code, the R's represent and array of registers that is passed to the real assembly code R0 is %rdi, R1 = %rdi+0x4,..., R15 = %rdi+0x3C. Some of those pseudo-instructions translate to one or more actual assembly instructions, and [Rn] represents the memory location which contains the byte code for the original architecture. So when it access [Rn], it uses the current value for the register as the position to get the next 4 bytes (an instruction on the fantasy architecture is 4 bytes long).

mov R0, 0x006C
mov R1, 0x0001
mov R2, [R0]
cmp R15, R2
je 0x0030
mov R14, R2
jg 0x0000
jl 0x0000
add R13, R14
and R12, R13
or R11, R12
xor R10, R11
shl R10, 0x00
shr R10, 0x00
sub R2, R1
mov [R0], R2
jmp 0xFFC8
mov R1, 0x0004
add R0, R1
mov R2, [R0]
add R0, R1
mov R4, [R0]
add R0, R1
mov R8, [R0]
add R0, R1
mov R12, [R0]
jmp 0x0004
mov R3, R12
or R6, R13
add R10, R8
sub R15, R14

For the original architecture instructions (00 to 0F) and 16 registers (R[0] to R[15], the output should follow the model:

original instruction opcode: number of times executed

array of registers: value stored. Something like this:

00:2
01:1
...
0e:1
0f:1
R[0]=0x0000006c
R[1]=0x00000001
...
R[13]=0x03885533
R[14]=0x03885533
R[15]=0x00000000

The problem is that I keep getting a segmentation fault when I try to save the opcode: number of executions. If I try to print only the "opcode:" and register:value pairs, there's no segmentation fault, but instead of printing the first opcode value as "0:", it prints "6C:" which is the R[0] and the r12 (asm register) according to the gdb:

I have tried to insert the push rbp, mov rbp, rsp before the assembly code and the pop rbp, ret after the assembly, but nothing works. Any ideas that could help? Any more infos that I could provide?

Thanks for the help and have a good day.

Why `mov ebp, esp` and not `mov rbp, rsp` ? Moving into `ebp` zeroes the upper half of `rbp`. — sj95126, Oct 25 '21 at 16:59
What arch is the target? x86? If so, what are `R0` etc vs (e.g.) `%rdi` etc? Are you preserving the callee preserved registers according to the AMD64 ABI? Do you want `PROT_WRITE | PROT_READ` and `PROT_EXEC | PROT_READ`? [respectively] in your `mprotect` calls? Are you ensuring that the I-cache is flushed/updated before calling the JIT code? Are you ensuring that all instructions have been executed before doing the `munmap`? — Craig Estey, Oct 25 '21 at 17:10
`sysconf(4096)` is wrong and should be `sysconf(_SC_PAGESIZE)`. If that's returning -1, then probably mmap and the rest of it fails. You should be checking the return values of all your system calls to make sure they succeeded before going on. — Nate Eldredge, Oct 25 '21 at 17:11
So your problem is in printing `ic[n]`? Where did you define and initialize `ic`? And how does the JIT'ed code know where to find it? — Nate Eldredge, Oct 25 '21 at 17:13
@sj95126 my bad, it was a typo when I was writing the post. It should be rbp, rsp. — Victor, Oct 25 '21 at 22:01
@CraigEstey I completely forgot to write on the post that R0...R15 is an int32_t array that simulates the registers from the source architecture. This array is passed as the first argument (%rdi) to the assembly code. The %rsi is an array containing the byte code for the original architecture so that I can use one of the instructions (mov mem[Rx], Ry) that can change the original byte code. From what I understood, we should give only one permission at a time. First we write and then we execute. — Victor, Oct 25 '21 at 22:11
@CraigEstey I'm gonna read about the l-cache fflush and update to use it before the jit. I tried using it inside that for loop as a suggestion I saw on a post here. All the instructions are executed before the munmap, since the array of R's contains the correct answer. — Victor, Oct 25 '21 at 22:16
@NateEldredge Thanks for your suggestion. I'm gonna change this on the code and check the return values, even though this part is working as it should with the 4096. About the ic[n], it's defined as global value right at the start of the program and I increment it when I'm translating the source machine code. Since the opcodes go from 0 to F, when it reads an opcode, ic[opcode]++. It's the same that was working on the previous part of the project, which was to develop an interpreter for the same architecture. — Victor, Oct 25 '21 at 22:23
fprintf crashing is almost always due to passing it a bogus pointer -- either the FILE * or a pointer with a `%s` format. Given your format (no %s) its likely that `output` is corrupt -- probably by your JIT code smashing the stack and corrupting the calling frame. Or perhaps by not properly saving and restoring some callee-save register. — Chris Dodd, Oct 26 '21 at 02:14
@CraigEstey: x86 has coherent I-cache, but C doesn't define the behaviour of storing bytes into an array and then casting it to a function pointer. To make sure GCC doesn't optimize away "dead" stores by the JIT into the buffer, use `__builtin___clear_cache`. On x86 it doesn't actually do anything with cache, just tells the optimizer the stores aren't dead, and are "visible" to later code-fetch. See [How to get c code to execute hex machine code?](https://stackoverflow.com/q/9960721) re: copying/storing into an executable mmap buffer. — Peter Cordes, Oct 26 '21 at 02:48
@CraigEstey: But given the mprotect call, that's almost certainly not the problem. GCC doesn't know anything about `mprotect`, so passing a pointer to the buffer to it will force GCC to have the architectural state matching the abstract machine for that buffer, i.e. any C assignments must have actually happened as stores in asm before that call. So that's 100% sufficient on x86 to ensure that later code-fetch can see those new insns: [Observing stale instruction fetching on x86 with self-modifying code](https://stackoverflow.com/q/17395557) — Peter Cordes, Oct 26 '21 at 02:50
`PROT_EXEC` without `PROT_READ` might be something the HW can actually do; I forget. But that should be fine here; code-fetch shouldn't count as a data read, so unless there are data loads from that buffer (e.g. RIP-relative or embedding the absolute address), it should be fine that neither mprotect has `PROT_READ`. Craig's right that it's odd, though. — Peter Cordes, Oct 26 '21 at 02:52

Segmentation fault when trying to fprintf after executing machine code with jit

0 Answers0