1

Disclamer: I do not have proper developer/programmer education that would probably helped here. Whatever I know is gathered from different sources and put together in the shown form.

I am importing a piece of shellcode(or whatever binary) into a C program by referencing a resulting symbol from running ld -b binary into my code. The idea is from embedding-binary-blobs-using-gcc-mingw/embedding-resources-in-executable-using-gcc.

The code is as follows:

extern unsigned int _binary_file_size; //Declaring the external symbol
extern char _binary_file_start[]; //Declaring the external symbols

memcpy(exec, _binary_file_start, _binary_file_size)

The problem is that the program does not work and the debugger shows it tries to load a memory address at the memory location of "size".

RAX to load the address of where the size is and then get the data at that memory locations ([0x82]) which is actually the size and not a memory address

The content of the memory where the size is

Since I could not find documentation on this symbols, just references in SO I experimented a bit in the blind and tried playing around with making the type of the variable a pointer and de-referencing it when calling. As I don't have a 100% grasp on pointers, double pointers, de-referencing and memory addressing so I just made empirical changes to see the output.

At some point I wanted to see the variable memory from C and not debugger and to my surprise, &_binary_file_size returned 0x80 (the size of the data). Now I've changed my code to use &_binary_file_size and it works (but with some type warnings at compilation).

The question is: What is the correct way of using the _binary_file_size symbol?

There are two workarounds: using the address (&) of the variable or calculating the size from subtracting the _start from the _end variables.

It seems in one of the threads from SO that I used for inspiration someone else had issues with this variable.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • To the linker, symbols have values. `_binary_file_start` is a symbol for which the linker sets the value to the address of the start of the BLOB you linked in. To C, those symbols are addresses. One thing that may work is `extern char _binary_file_size;` followed by using `(size_t) &_binary_file_size` for the size, as you seem to have done. Another is that the linker may define an `_end` symbol too, so you could use `extern char _binary_file_start[], _binary_file_end[];` followed by `_binary_file_end - _binary_file_start` for the size. – Eric Postpischil Feb 21 '23 at 00:40
  • (The latter falls afoul of pointer arithmetic rules in the C standard. I do not know whether GCC and Clang’s ever-increasing pointer provenance features might break it or whether there is a compiler switch to avoid that. If ever does not work, `(uintptr_t) _binary_file_end - (uintptr_t) _binary_file_start` might work.) – Eric Postpischil Feb 21 '23 at 00:43
  • @EricPostpischil How do I know it's type and size (size_t)? Is it documented, by looking at the memory or just knowledge? – user21251750 Feb 21 '23 at 11:20
  • There is no actual C type because there is no object there, as the C standard defines it. In C, we merely declare something so we can take its address. Any type whose alignment requirement is only one byte would work, so a character type or array of character type is fine. Its type is not `size_t`; we just convert the address (the linker “value” of the symbol) to `size_t` because that is the type to use in C for object sizes. – Eric Postpischil Feb 21 '23 at 12:43

1 Answers1

0

When you include a file using ld -b binary example, the linker supplies three symbols in the symbol table: _binary_example_start, _binary_example_end, and _binary_example_size. These symbols do not have a value, but merely an address. To reference them from C, you need to declare them as extern identifiers.

extern char _binary_example_start;
extern char _binary_example_end;
extern char _binary_example_size;

Because these symbols do not have a value, they also do really have a type. To avoid potential issues with alignment, it is probably best to declare them as char, which has an alignment requirement of 1.

In order to use these symbols, you need to tell the C compiler to look them up in the symbol table. This is done using the address-of operator: &binary_example_start. To better understand this, you could think of the address of a symbol as being its value. This is explained in the documentation of ld here, albeit in the context of linker scripts.

Using the address-of operator results in a pointer; in this case a pointer to char. In the case of _binary_example_size this is not what you need, and therefor you should cast it to the appropriate type:

(size_t) &_binary_example_start

Alternatively, you can also get the size by substracting the start address from the end address:

&_binary_example_end - &_binary_example_start

This will result in a value of type ptrdiff_t. Note that when you declare them as array, you do not need the address-of operator.

Both these methods are equivalent in result, and both are correct.

I could not find documentation on these symbols either. They are defined in binary.c in the source libbfd of GNU binutils.

As a final sidenote: characters in the filename that are not valid symbol characters are replaced by an underscore. Notably, this means that if the filename is example.txt, the resulting symbols are _binary_example_txt_start, etc.

Emanuel P
  • 1,586
  • 1
  • 6
  • 15