3

Using this answer (and this follow-up) as inspiration I was looking at ways to do some functional programming in C (for which there are already plenty of interesting discussions on this site). What I'd like to know is how and when it's possible to use the approach taken in the linked code, of casting a string to a function pointer and executing it. For example on my machine (OSX 10.10, Darwin 14.0.0, GCC 4.8.3) I can compile and run

int eax = ((int(*)())("\xc3 <- This returns the value of the EAX register"))();

(always returning 0, which is what I'd expect if the program does nothing else) but

#include <stdio.h>

int main() {
  const char* lol = "\x8b\x5c\x24\x4\x3d\xe8\x3\x0\x0\x7e\x2\x31\xc0\x83\xf8\x64\x7d\x6\x40\x53\xff\xd3\x5b\xc3\xc3 <- Recursively calls the function at address lol.";
  int i = ((int(*)())(lol))(lol);
  printf("i: %d\n",i);
  return 0;
}

segfaults. On the other hand codepad successfully runs the second example giving the correct answer i: 100.

When is it possible to execute from strings? And is there a way to make it (relatively) consistent?

(I can reasonably guess this is undefined behaviour and I know I'm going to increase worldwide unemployment by using it.)

Community
  • 1
  • 1
  • 2
    Relevant search term: [tag:shellcode]. – DCoder Jan 03 '15 at 17:54
  • Assuming the language and compiler allow this, the other important prerequisite is that the OS allow execution of data. Many OS disable this on security grounds, as they should by default. It is possible that OSX allows you to execute the former code but not the latter because of a nuanced security policy, but I am speculating wildly about this. – Jeff Hammond Jan 03 '15 at 18:01
  • Thanks @DCoder, it's amazing how much of a difference knowing the name for something can make. –  Jan 03 '15 at 18:38

2 Answers2

4

It is certainly (legally) undefined behavior, and practically it is implementation specific.

You need several things to have this executed successfully.

  • first, you need the machine code inside your literal string to be correct. This obviously is processor and ABI specific. But I trust you on that.
  • then, you depend upon the protocol used to call a function pointer, i.e. upon the ABI specification.
  • at last, on several processors (notably x86-64) you need the machine code to be in some executable segment. I guess it is not usually the case (but that might be operating system specific). Read more about the NX bit and ASLR (and also PIC). Sometimes this can be circumvented, e.g. by appropriately mmap-ing some segment with execute permissions and copying the machine code there.

BTW, you might be interested by JIT compilation techniques and libraries (libjit, lightning, asmjit, LLVM ...)

As DCoder commented, read more about shellcode & more generally code injection

A more portable approach might be (as I do in MELT) to generate some C (or C++) code on the fly, forking a compilation of that code into a shared object, then dlopen-ing that shared object (& dlsym-ing appropriately).

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Even if one places a pattern of bits in memory that would be valid executable code, and has all necessary permissions to run it, and one consults the ABI documentation to create a function pointer to it, there's still no guarantee a compiler won't decide that since the Standard doesn't require compilers to behave in any particular function when trying to execute data, it has no obligation to generate the machine instructions your code would otherwise imply. – supercat Jul 22 '16 at 16:48
0

Generally speaking, the contents of string literals in Linux and OSX are stored in a read only segment which also happens to be executable (this may not necessarily be the case in Windows or other platforms). That is why you can do things like

(L"\xfeeb")();

on x86 and x86_64 Linux and OSX and not get a compiler error. However, if the machine language instructions you put in the string literal do not conform to the requirements of the way functions are supposed to be structured according to your operating system and hardware platform, you are likely to experience a segfault. An executable string literal that works on Linux Aarch64 may not work on OSX on x86_64 and vice versa.

If you want to explore programmatic generation of executable machine code, you can (on POSIX) allocate a region of executable memory with the mmap() function, place your code there and experiment to your heart's content.

At some point, you may find disassemble <addr>,+<range> useful in gdb and disassemble --start-address <addr> --end-address <addr> useful in lldb.

ceilingcat
  • 671
  • 7
  • 11