3

I'd like to write my own loader for binary code on x64 Linux. In the future I want to be able to perform the linking step myself and thus be able to call code from .o object-files. But now, I want to call a function from an executable binary that has already been linked.

To create some function that should be callable from "outside", I started with the following piece of source code:

void foo(void)
{
  int a = 2;
  int b = 3;
  a + b;
}

int main(void)
{
  foo();
  return 0;
}

It's the foo()-function I want to call using my loader. Using the following chain of commands

gcc -o /tmp/main main.c
strip -s /tmp/main
objdump -D /tmp/main

I obtained the assembly code of the foo() function, which looks like this:

...
0000000000001125 <foo>:
    1125:   55                      push   %rbp
    1126:   48 89 e5                mov    %rsp,%rbp
    1129:   c7 45 fc 02 00 00 00    movl   $0x2,-0x4(%rbp)
    1130:   c7 45 f8 03 00 00 00    movl   $0x3,-0x8(%rbp)
    1137:   90                      nop
    1138:   5d                      pop    %rbp
    1139:   c3                      retq
...

That means, that the foo() function starts at offset 0x1125 in main. I verified this using a hexeditor.

The following is my loader. There is no error handling yet and the code is very ugly. However, it should demonstrate, what I want to achieve:

#include <stdio.h>
#include <stdlib.h>

typedef void(*voidFunc)(void);

int main(int argc, char* argv[])
{
  FILE *fileptr;
  char *buffer;
  long filelen;
  voidFunc mainFunc;

  fileptr = fopen(argv[1], "rb");  // Open the file in binary mode
  fseek(fileptr, 0, SEEK_END);          // Jump to the end of the file
  filelen = ftell(fileptr);             // Get the current byte offset in the file
  rewind(fileptr);                      // Jump back to the beginning of the file

  buffer = (char *)malloc((filelen+1)*sizeof(char)); // Enough memory for file + \0
  fread(buffer, filelen, 1, fileptr); // Read in the entire file
  fclose(fileptr); // Close the file

  mainFunc = ((voidFunc)(buffer + 0x1125));

  mainFunc();

  free(buffer);

  return 0;
}

When executing this program objloader /tmp/main it results in a SEGFAULT.

The mainFunc variable points to the correct place. I verified this using gdb.

Is it a problem that the opcode lives on the heap? Actually I decided to make the function I want to call as simple as possible (side-effects, required stack or registers for function parameters, ...). But still, there is something, I don't really get.

Can anyone please point me to the right direction here? Any hints on helpful literature in that regard are also highly appreciated!

dubbaluga
  • 2,223
  • 5
  • 29
  • 38
  • Why do you want to load *unlinked object* files? That makes no sense. Do you mean you want to load *dynamic libraries* ending in `.so`? – Some programmer dude Aug 02 '18 at 13:27
  • 2
    Yes, you need to mark the memory executable. Simplest is to use [`mmap`](http://man7.org/linux/man-pages/man2/mmap.2.html) not `malloc`/`fread`. Also your code should be position independent if you can not guarantee the load address (this one is). You might want to look at [`dlopen`](http://man7.org/linux/man-pages/man3/dlopen.3.html) too. – Jester Aug 02 '18 at 13:28
  • As for your problem, *why* do you want to write your own loader? And *why* don't you want to parse the file format? Because the offsets to e.g. the `main` function might not be fixed. You need to learn the underlying file format (probably [ELF (Executable and Linkable Format)](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format)). – Some programmer dude Aug 02 '18 at 13:29
  • I thought I'd need to mark the memory executable but I didn't know that I'd need to do that using mmap. Will give that a try! Thx – dubbaluga Aug 02 '18 at 13:29
  • Calling unlinked object-files might make sense in the following situation: clang/LLVM can generate code for a huge set of platforms. In case there is some compiler/linker available that costs $$$, it might be possible to do the linking/loading myself and thus create the loader using the $$$ compiler/linker, but actually execute some other code that has been compiled using free clang/LLVM. – dubbaluga Aug 02 '18 at 13:32
  • Cross-ref: https://stackoverflow.com/questions/6315296/whats-the-protection-flags-of-memory-allocated-by-malloc – dubbaluga Aug 02 '18 at 13:39
  • 3
    You are reading the executable file into memory and attempting to execute at offset 0x1125 in the file because `foo` is at offset 0x1125 in the code section. But the executable file is not merely an image of what the program, or its code, should look like in memory. It is a structured file. There is information at the beginning saying what is in the file, and there are multiple sections. Each section has information saying what type it is, how long it is and so on. And there is information on relocatable symbols and whatnot. To load an executable, you must parse and process the file contents. – Eric Postpischil Aug 02 '18 at 13:41
  • Yes, I know there are several sections. But in this case, the function should not rely on any other section. I still expect it to be callable 'as-is'. – dubbaluga Aug 02 '18 at 13:43
  • As it is what? The code for `foo`, which is at 0x1125 in the code section when properly loaded, is **not** at offset 0x1125 in the file. If you dump the file with `od` and compare the bytes to the bytes shown in the disassembly for `foo`, you will see they do not match. Reading the file into memory and calling the code at offset 0x1125 from the buffer will not work. – Eric Postpischil Aug 02 '18 at 13:51
  • @EricPostpischil he said: _"The mainFunc variable points to the correct place. I verified this using gdb"_. – Jester Aug 02 '18 at 13:53
  • @Jester: I do not know what OP did with GDB. They may have verified that `mainFunc` points to 0x1125 bytes beyond their buffer. That would, of course, not verify that it points to code for `foo`. The fact remains that executable files are not flat data that can be loaded into memory and executed. – Eric Postpischil Aug 02 '18 at 15:26

2 Answers2

5

In order to make the buffer memory region executable, you will have to use mmap. Try

#include <sys/mman.h>
...
buffer = (char *)mmap(NULL, filelen /* + 1? Not sure why. */, PROT_EXEC | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

That should give the memory region the permissions you want and have it work with the surrounding code. In fact, if you want to use mmap the way it was meant to be used, go for

int fd = open(argv[1], O_RDONLY);
struct stat myfilestats;
fstat(fd, &myfilestats);
buffer = (char*)mmap(NULL, myfilestats.st_size, PROT_EXEC, MAP_PRIVATE, fd, 0);
fclose(fd);
...
munmap(buffer, myfilestats.st_size);

Using MAP_ANONYMOUS will make the memory region unassociated with a file descriptor, but the idea is that if it represents a file, the file descriptor should be associated with it. When you do this Linux will do all kinds of cool tricks, such as only load parts of the file that you actually end up accessing (lazy loading will also make the program very smooth when the file is large), and if multiple programs are all accessing the same file then they will all share the same physical memory location.

Nicholas Pipitone
  • 4,002
  • 4
  • 24
  • 39
1

This is the final version of my 'loader' which is based on Nicholas Pipiton's answer. Again: no error-handling, simplified, not considering, that real-world scenarios are much more difficult, etc.:

#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

#include <stdlib.h>

typedef void(*voidFunc)(void);

int main(int argc, char* argv[])
{
  char* buffer;
  voidFunc mainFunc;
  struct stat myfilestats;
  int fd;

  fd = open(argv[1], O_RDONLY);
  fstat(fd, &myfilestats);
  buffer = mmap(NULL, myfilestats.st_size, PROT_EXEC, MAP_PRIVATE, fd, 0);
  close(fd);

  mainFunc = ((voidFunc)(buffer + 0x1125));

  mainFunc();

  munmap(buffer, myfilestats.st_size);

  return EXIT_SUCCESS;
}
dubbaluga
  • 2,223
  • 5
  • 29
  • 38