How can I call inlined machine code in Python on Linux?

Question

I'm trying to call inlined machine code from pure Python code on Linux. To this end, I embed the code in a bytes literal

code = b"\x55\x89\xe5\x5d\xc3"

and then call mprotect() via ctypes to allow execution of the page containing the code. Finally, I try to use ctypes to call the code. Here is my full code:

#!/usr/bin/python3

from ctypes import *

# Initialise ctypes prototype for mprotect().
# According to the manpage:
#     int mprotect(const void *addr, size_t len, int prot);
libc = CDLL("libc.so.6")
mprotect = libc.mprotect
mprotect.restype = c_int
mprotect.argtypes = [c_void_p, c_size_t, c_int]

# PROT_xxxx constants
# Output of gcc -E -dM -x c /usr/include/sys/mman.h | grep PROT_
#     #define PROT_NONE 0x0
#     #define PROT_READ 0x1
#     #define PROT_WRITE 0x2
#     #define PROT_EXEC 0x4
#     #define PROT_GROWSDOWN 0x01000000
#     #define PROT_GROWSUP 0x02000000
PROT_NONE = 0x0
PROT_READ = 0x1
PROT_WRITE = 0x2
PROT_EXEC = 0x4

# Machine code of an empty C function, generated with gcc
# Disassembly:
#     55        push   %ebp
#     89 e5     mov    %esp,%ebp
#     5d        pop    %ebp
#     c3        ret
code = b"\x55\x89\xe5\x5d\xc3"

# Get the address of the code
addr = addressof(c_char_p(code))

# Get the start of the page containing the code and set the permissions
pagesize = 0x1000
pagestart = addr & ~(pagesize - 1)
if mprotect(pagestart, pagesize, PROT_READ|PROT_WRITE|PROT_EXEC):
    raise RuntimeError("Failed to set permissions using mprotect()")

# Generate ctypes function object from code
functype = CFUNCTYPE(None)
f = functype(addr)

# Call the function
print("Calling f()")
f()

This code segfaults on the last line.

Why do I get a segfault? The mprotect() call signals success, so I should be permitted to execute code in the page.
Is there a way to fix the code? Can I actually call the machine code in pure Python and inside the current process?

(Some further remarks: I'm not really trying to achieve a goal -- I'm trying to understand how things work. I also tried to use 2*pagesize instead of pagesize in the mprotect() call to rule out the case that my 5 bytes of code fall on a page boundary -- which should be impossible anyway. I used Python 3.1.3 for testing. My machine is an 32-bit i386 box. I know one possible solution would be to create a ELF shared object from pure Python code and load it via ctypes, but that's not the answer I'm looking for :)

Edit: The following C version of the code is working fine:

#include <sys/mman.h>

char code[] = "\x55\x89\xe5\x5d\xc3";
const int pagesize = 0x1000;

int main()
{
    mprotect((int)code & ~(pagesize - 1), pagesize,
             PROT_READ|PROT_WRITE|PROT_EXEC);
    ((void(*)())code)();
}

Edit 2: I found the error in my code. The line

addr = addressof(c_char_p(code))

first creates a ctypes char* pointing to the beginning of the bytes instance code. addressof() applied to this pointer does not return the address this pointer is pointing to, but rather the address of the pointer itself.

The simplest way I managed to figure out to actually get the address of the beginning of the code is

addr = addressof(cast(c_char_p(code), POINTER(c_char)).contents)

Hints for a simpler solution would be appreciated :)

Fixing this line makes the above code "work" (meaning it does nothing instead of segfaulting...).

I've only used ctypes a little bit, but skimming the docs, it looks like CFUNCTYPE is used to take a Python function and wrap it to be called by C (e.g. for a qsort comparison function). You're trying to do the inverse, making a C function callable from Python, so I think CFUNCTYPE is the wrong thing. — Russell Borogove, May 26 '11 at 18:10
@Russell: `CFUNCTYPE` can be used for different purposes. According to the docstring of `CFUNCTYPE`, one possible constructor call is `prototype(integer address) -> foreign function`. — Walter, May 26 '11 at 18:14

score 7 · Accepted Answer · answered May 26 '11 at 21:24

I did a quick debug on this and it turns out the pointer to the code is not being correctly constructed, and somewhere internally ctypes is munging things up before passing the function pointer to ffi_call() which invokes the code.

Here is the line in ffi_call_unix64() (I'm on 64-bit) where the function pointer is saved into %r11:

57   movq    %r8, %r11               /* Save a copy of the target fn.

When I execute your code, here is the value loaded into %r11 just before it attempts the call:

(gdb) x/5b $r11
0x7ffff7f186d0: -108    24      -122    0       0

Here is the fix to construct the pointer and call the function:

raw = b"\x55\x89\xe5\x5d\xc3"
code = create_string_buffer(raw)
addr = addressof(code)

Now when I run it I see the correct bytes at that address, and the function executes fine:

(gdb) x/5b $r11
0x7ffff7f186d0: 0x55    0x89    0xe5    0x5d    0xc3

Thanks a lot! I already figured this out about an hour ago -- see the edit to my question. Your solution looks nicer than mine, though. (I did not post an answer myself because I'm not allowed to until 8 hours after posting my question.) — Walter, May 26 '11 at 21:34
Oops, didn't see your edit, but glad to hear you've got it working! — samplebias, May 26 '11 at 21:35

score 3 · Answer 2 · edited May 23 '17 at 11:59

3

You might have to flush the instruction cache.

It is unclear (to me, anyway) whether mprotect() automatically does this.

[update]

Of course, had I read the documentation for cacheflush(), I would have seen that it only applies on MIPS (according to the man page).

Assuming this is x86, you might have to invoke the WBINVD (or CLFLUSH) instruction.

In general, self-modifying code needs to flush the i-cache, but as far as I can tell there is no remotely portable way to do so.

edited May 23 '17 at 11:59

Community

1
1

answered May 26 '11 at 18:08

Nemo

70,042
10
116
153

This is not really self-modifying code. My code is somewhere in the data segment, so the instruction cache shouldn't be an issue here. – Walter May 26 '11 at 18:16
Once you're tying yourself into the ABI, there's no remotely portable way to do anything :-) – Nicholas Riley May 26 '11 at 18:16
@Walter Sorry, I should have said "just-in-time compilers need to flush the i-cache". The fact that it is in the data segment does not matter; consider (e.g.) a write-back d-cache that is not synchronized with the i-cache. You _must_ invoke an appropriate cache flush after writing the code before you can execute it safely. This is actually [documented by Intel](http://fixunix.com/questions/14592-how-do-i-flush-invalidate-cpu-instruction-cache.html#post56802) – Nemo May 26 '11 at 18:43
1

@Nemo: Thanks for the explanation. Unfortunately, the necessity of calling the mentioned machine code instructions leaves me facing the problem how to call inline machine code from Python... :) – Walter May 26 '11 at 18:51
1

@Nemo: Honestly, I don't think this is an issue for two reasons: 1. The above C code works. 2. When calling the code for the first time, the corresponding memory shouldn't be mapped to the instruction cache. JIT compilers will reuse the memory for new code, so the situation is different. – Walter May 26 '11 at 18:52
@Walter It may not be an issue for you, but if so, you are getting lucky... Modern CPUs have "write-back" data caches which are separate from the instuction cache. So when you write to memory, your write can sit in the d-cache and be "invisible" to the i-cache until the d-cache is flushed. So JITs do not only have to worry about a stale i-cache; they also have to worry about the i-cache being _inconsistent_ with the d-cache... Even when the memory is being written for the very first time. – Nemo May 26 '11 at 19:03
@Nemo: Thanks again -- I finally got the point. This could even explain why it is working in C, but not in Python, since the C code initialises the variable `code` while loading the binary, while the Python code does so some time during execution. – Walter May 26 '11 at 19:25
@Nemo: Seems I can get away without flushing the cache -- see my second edit. Probably, `mprotect()` implicitly flushes the cache; at least, the call to `flush_cache_range()` in [`mm/mprotect.c`](http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/mprotect.c;h=5a688a2756bec54435adbd5f3c13a33fbdd2c11e;hb=HEAD) seems to be a strong indication that it does so. – Walter May 26 '11 at 20:09
@Walter: Yup, that appears to handle it. Good to know for future reference :-) – Nemo May 26 '11 at 20:42

score 2 · Answer 3 · answered May 26 '11 at 18:15

2

I'd suggest you try to get your code working in C first, then translate to ctypes. There's also something like CorePy if you just want to be able to execute assembly from Python.

answered May 26 '11 at 18:15

Nicholas Riley

43,532
6
101
124

How can I call inlined machine code in Python on Linux?

3 Answers3

Linked