7

My command-line program's build process generates a binary file (over 500KB) that currently has to be referenced by path from argv. I would like to embed this file in the executable instead.

On Linux, it appears possible to use objcopy to make an object file from a binary file:

objcopy --input binary --output elf32-i386 --binary-architecture i386 myfile.dat myfile.o

However, the OS X developer toolchain doesn't include an objcopy command. Short of installing binutils, what are the possibilities?

I build my project from Xcode and the file is generated with a custom build rule.

zneak
  • 134,922
  • 42
  • 253
  • 328

2 Answers2

9

During the link phase, pass the arguments -sectcreate <segname> <sectname> <file> to the linker. If you're driving the linker via an invocation of the compiler, which is pretty common, you'd pass it as -Wl,-sectcreate,<segname>,<sectname>,<file>.

You'd make up segment and section names.

You would use the getsectdata() function along with _dyld_get_image_vmaddr_slide() to get a pointer to the data at runtime.

Ken Thomases
  • 88,520
  • 7
  • 116
  • 154
  • Any truth in `getsectiondata` avoiding the need of underscored functions? – zneak Jan 06 '16 at 20:25
  • Also, is there a prettier way to get the Mach header than calling _dyld_get_image_header? – zneak Jan 06 '16 at 20:36
  • The only way you'd be able to rely on `getsectiondata()` would be if it were documented, which it doesn't seem to be. Why are you superstitious about "underscored" functions? They are documented and perfectly safe to use. If you want the Mach header of the executable, as opposed to any dynamic library, and you're writing code in the executable, you can simply refer to `&_mh_execute_header`. See /usr/include/mach-o/ldsyms.h. By the way, you can pass the index 0 to the dyld(3) functions (e.g. `_dyld_get_image_vmaddr_slide()`) to refer to the main executable. – Ken Thomases Jan 06 '16 at 21:02
  • If we're gonna dig into what's documented and what isn't, it says nowhere that `getsectdata` doesn't adjust the value for the ASLR slide, so maybe I should be afraid that Apple will eventually fix it and my code will break. I'm sure that we both suspect that they won't, but I also suspect that `getsectiondata` isn't going away exactly because of that. Anyway I look at it, this feels as hacky and weird as using `dlsym` to get a handle to a global variable (because that's basically what it is), so I might as well take the function that makes it feel the least bad about it. – zneak Jan 06 '16 at 21:14
  • The last-mile stretch is that there seems to be no convenient way to link in the output files automatically regardless of their name. If I rename the file or add new ones, I also need to go change the linker flags. I'll accept the answer if I find a way to do it but I may leave it open if something else comes up. – zneak Jan 06 '16 at 21:28
  • The man page for `getsectdata()` (which is not what I linked to because it doesn't seem to be online) says "Getsectdata is the same as getsectdatafromheader with its first argument being the link editor defined symbol _mh_execute_header." That same man page says that you have to add the slide to the result from `getsectdatafromheader()`. Although I guess it does specify that that's only for dynamic libraries, but executables are slid by ASLR, too, unless they weren't built position-independent. – Ken Thomases Jan 06 '16 at 22:00
3

As evidenced in this other question about objcopy, another way to include a binary file into an executable is to use the .incbin assembler directive. This solution has two main advantages over objcopy: the developer is in control of the symbol names (objcopy appears to have a fixed scheme to name them), and, well, it doesn't require objcopy.

The solution also has advantages over the linker-based -sectcreate solution. It's cross-platform and accessing the data is much, much more straightforward.

I'm using this Xcode build rule script to generate the file to be included and an assembly file with the .incbin directive:

my_generation_tool -o $DERIVED_FILE_DIR/$INPUT_FILE_NAME.out $INPUT_FILE_PATH

export AS_PATH=$DERIVED_FILE_DIR/$INPUT_FILE_NAME.out.s

echo "\t.global _data_start_$INPUT_FILE_BASE" > $AS_PATH
echo "\t.global _data_end_$INPUT_FILE_BASE" >> $AS_PATH
echo "_data_start_ $INPUT_FILE_BASE:" >> $AS_PATH
echo "\t.incbin \"$INPUT_FILE_NAME.out\"" >> $AS_PATH
echo "_data_end_$INPUT_FILE_BASE:" >> $AS_PATH

Then, given a file "somefile.gen" that is processed with this rule, the assembly will look like:

    .global _data_start_somefile
    .global _data_end_somefile
_data_start_somefile:
    .incbin "somefile.gen.out"
_data_end_somefile:

The data can be accessed in C using the data_start_somefile and data_end_somefile symbols (the macOS linker prefixes C names with a spurious _, that's why the assembly file has them):

extern char data_start_somefile, data_end_somefile;

for (const char* c = &data_start_somefile; c != &data_end_somefile; ++c)
{
    // do something with character
}

The answer on the other thread has more bells and whistles that some people may find useful (for instance, a length symbol).

Community
  • 1
  • 1
zneak
  • 134,922
  • 42
  • 253
  • 328