111

Does anybody have an idea how to statically compile any resource file right into the executable or the shared library file using GCC?

For example I'd like to add image files that never change (and if they do, I'd have to replace the file anyway) and wouldn't want them to lie around in the file system.

If this is possible (and I think it is because Visual C++ for Windows can do this, too), how can I load the files which are stored in the own binary? Does the executable parse itself, find the file and extract the data out of it?

Maybe there's an option for GCC which I haven't seen yet. Using search engines didn't really spit out the right stuff.

I would need this to work for shared libraries and normal ELF-executables.

janw
  • 8,758
  • 11
  • 40
  • 62
Atmocreations
  • 9,923
  • 15
  • 67
  • 102
  • 3
    Possible duplicate of http://stackoverflow.com/questions/1997172/is-there-a-linux-equivalent-of-windows-resource-files – blueberryfields Feb 01 '11 at 16:09
  • The objcopy link in the question blueberryfields pointed to is a good, generic solution to this too – Flexo Feb 01 '11 at 16:19
  • @blueberryfields: sorry for duplicating. You're right. Normally I would vote for close as duplicate. But because they all posted so nice answers, I'll just accept one. – Atmocreations Feb 01 '11 at 18:03
  • 1
    possible duplicate of [Embedding resources in .exe using GCC](http://stackoverflow.com/questions/4158900/embedding-resources-in-exe-using-gcc) – jww Jan 30 '14 at 03:17
  • Can I add that John Ripley's method is probably the best one here for one huge reason - alignment. If you do a standard objcopy or "ld -r -b binary -o foo.o foo.txt" and then look at the resulting object with objdump -x it looks like the alignment for the block is set to 0. If you want alignment to be correct for binary data other than char, I can't imagine this is a good thing. – carveone Jan 11 '12 at 16:38

7 Answers7

91

Update I have grown to prefer the control John Ripley's assembly .incbin based solution offers and now use a variant on that.

I have used objcopy (GNU binutils) to link the binary data from a file foo-data.bin into the data section of the executable:

objcopy -B i386 -I binary -O elf32-i386 foo-data.bin foo-data.o

This gives you a foo-data.o object file which you can link into your executable. The C interface looks something like

/** created from binary via objcopy */
extern uint8_t foo_data[]      asm("_binary_foo_data_bin_start");
extern uint8_t foo_data_size[] asm("_binary_foo_data_bin_size");
extern uint8_t foo_data_end[]  asm("_binary_foo_data_bin_end");

so you can do stuff like

for (uint8_t *byte=foo_data; byte<foo_data_end; ++byte) {
    transmit_single_byte(*byte);
}

or

size_t foo_size = (size_t)((void *)foo_data_size);
void  *foo_copy = malloc(foo_size);
assert(foo_copy);
memcpy(foo_copy, foo_data, foo_size);

If your target architecture has special constraints as to where constant and variable data is stored, or you want to store that data in the .text segment to make it fit into the same memory type as your program code, you can play with the objcopy parameters some more.

ndim
  • 35,870
  • 12
  • 47
  • 57
  • good idea! In my case it's not very useful. But this is something that I'm really gonna put into my snippet-collection. Thanks for sharing this! – Atmocreations Feb 01 '11 at 18:04
  • 3
    It's a bit easier to use `ld` as the output format is implied there, see http://stackoverflow.com/a/4158997/201725. – Jan Hudec Mar 11 '14 at 18:40
63

With imagemagick:

convert file.png data.h

Gives something like:

/*
  data.h (PNM).
*/
static unsigned char
  MagickImage[] =
  {
    0x50, 0x36, 0x0A, 0x23, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x64, 0x20, 
    0x77, 0x69, 0x74, 0x68, 0x20, 0x47, 0x49, 0x4D, 0x50, 0x0A, 0x32, 0x37, 
    0x37, 0x20, 0x31, 0x36, 0x32, 0x0A, 0x32, 0x35, 0x35, 0x0A, 0xFF, 0xFF, 
    0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 

....

For compatibility with other code you can then use either fmemopen to get a "regular" FILE * object, or alternatively std::stringstream to make an iostream. std::stringstream is not great for this though and you can of course just use a pointer anywhere you can use an iterator.

If you're using this with automake don't forget to set BUILT_SOURCES appropriately.

The nice thing about doing it this way is:

  1. You get text out, so it can be in version control and patches sensibly
  2. It is portable and well defined on every platform
Flexo
  • 87,323
  • 22
  • 191
  • 272
  • 2
    Bleahg! That's the solution I thought of too. Why anybody would ever want to do this is beyond me. Storing pieces of data in a well-defined namespace is what filesystems are for. – Omnifarious Feb 01 '11 at 16:10
  • 39
    Occasionally, you have an executable which runs where there is no filesystem, or even no operating system. Or your algorithm needs some precalculated table for lookups. And I am sure there are alot more cases when storing data in the program makes a **lot** of sense. – ndim Feb 01 '11 at 16:39
  • 22
    This use of convert is exactly the same as `xxd -i infile.bin outfile.h` – greyfade Feb 01 '11 at 16:53
  • 1
    @greyfade - not quite, it converts it to PNM first, where as unless I'm mistaken xxd is just a direct re-encoding. – Flexo Feb 01 '11 at 16:58
  • [Accept:] easy to use, easy to understand, easy to debug and just plain simple. In fact exactly what I need, only that I prefer the solution with `xxd -i infile.bin outfile.h`. @awoodland: You're right, that's why I prefer it that way. – Atmocreations Feb 01 '11 at 18:05
  • 6
    One downside to this approach is that some compilers can't handle such enormous static arrays, if your images are particularly big; the way to get around that is, as [ndim](http://stackoverflow.com/questions/4864866/c-c-with-gcc-statically-add-resource-files-to-executable-library/4865249#4865249) suggests, to use `objcopy` to convert the binary data directly to an object file; however this is rarely a concern. – Adam Rosenfield Feb 05 '11 at 23:37
  • @Omnifarious - By including the resource in the executable, it is covered by the signature on the executable. As far as I know, you can't add signatures to other file types, such as PNGs. The format knows nothing about signatures and verification. – jww Nov 18 '12 at 03:45
  • @noloader: That's what a good package management system is for. :-) It should store hashes of all the files in the package along with a list of the files that can change (for configuration purposes for example). Then the package manager can easily verify the integrity of the signature of all of those hashes and the validity of the appropriate hashes. – Omnifarious Nov 18 '12 at 21:03
  • I'm not passing judgement on the appropriateness of the aim either way. There are cases where it's clearly appropriate and cases where it's clearly not appropriate. Technically it's feasible though. – Flexo Nov 18 '12 at 21:06
  • @Omnifarious - I don't disagree with you per se, but that requires a more complex system. It takes us from verifying a single signature on a package to verifying individual signatures multiple times. And on two similar implementations, I believe data (such as a JPG or PNG) is always verified because the signature only covers executable code and select supporting files. Put another way, the extensibility breaks Schneier and Wagner's "Semantic Authentication" principal (in practice). – jww Nov 18 '12 at 23:22
  • 5
    Keep in mind that defining it in a header like this means that each file which includes it will get its own copy. It is better to declare it in the header as extern and then define it in a cpp. [Example here](http://stackoverflow.com/questions/4391467/declare-array-in-c-header-and-define-it-in-cpp-file) – Nicholas Smith Jun 09 '15 at 00:05
  • @qexyn - that's a "feature" of image magic, not a deliberate choice. – Flexo Jun 09 '15 at 06:41
  • A poor man's solution. A real man's solution uses the assembler/linker. – étale-cohomology Jan 14 '21 at 06:54
55

You can embed binary files in executable using ld linker. For example, if you have file foo.bar then you can embed it in executable adding the following commands to ld

--format=binary foo.bar --format=default

If you are invoking ld thru gcc then you will need to add -Wl

-Wl,--format=binary -Wl,foo.bar -Wl,--format=default

Here --format=binary tells the linker that the following file is binary and --format=default switches back to default input format (this is usefull if you will specify other input files after foo.bar).

Then you can access content of your file from code:

extern uint8_t data[]     asm("_binary_foo_bar_start");
extern uint8_t data_end[] asm("_binary_foo_bar_end");

There is also symbol named "_binary_foo_bar_size". I think it is of type uintptr_t but i didn't check it.

Simon
  • 3,224
  • 3
  • 23
  • 17
  • 1
    Nice one! Just one question: why is `data_end` an array, not a pointer? (Or is this idiomatic C?) – xtofl Dec 13 '12 at 10:25
  • 2
    @xtofl, if `data_end` will be a pointer then compiler will think that there is a pointer stored after file content. Similary, if you will change type of `data` to a pointer then you will get pointer consisting of first bytes of a file instead of pointer to its begining. I think so. – Simon Dec 14 '12 at 09:30
  • ... does not compute. I think for the compiler there's no difference at all. And if there were, the array version would be more likely to cause confusion. – xtofl Dec 15 '12 at 08:53
  • 1
    +1: Your answer allows me to embbed a java class loader and a Jar into an exe to build a custom java launcher – Aubin Feb 17 '13 at 11:04
  • 2
    @xtofl - If you are going to make it a pointer, make it a `const pointer`. The compiler lets you change the value of non-const pointers, it does not let you change the value if it is an array. So it is perhaps less typing to use the array syntax. – Jesse Chisholm Jul 28 '15 at 22:22
  • @JesseChisholm: just wondering if that's a pointer vs. array thing: isn't `*data_end=1` equivalent to just `data_end[0]=1` – xtofl Jul 31 '15 at 12:40
  • @xtofl - those are equivalent, yes, because you used the pointer as if it were const. But if you accidently typed `data_end = 0;` without the ''*'' the compiler now cares if it is a pointer variable or an array name. Making it a `const char *` makes to exactly equivalent to an array name. – Jesse Chisholm Aug 01 '15 at 13:52
  • hi @aubin is it possible to share your custom solution to load jar files? –  Jan 29 '16 at 18:25
  • @JesseChisholm when you say "const pointer" you mean `char * const foo` right? `const char * foo` is a (non-const) pointer to const char – KevinOrr May 06 '20 at 06:29
42

You can put all your resources into a ZIP file and append that to the end of the executable file:

g++ foo.c -o foo0
zip -r resources.zip resources/
cat foo0 resources.zip >foo

This works, because a) Most executable image formats don't care if there's extra data behind the image and b) zip stores the file signature at the end of the zip file. This means, your executable is a regular zip file after this (except for your upfront executable, which zip can handle), which can be opened and read with libzip.

Nordic Mainframe
  • 28,058
  • 10
  • 66
  • 83
  • 7
    If I want to join foo0 and resources.zip into foo, then I need > if I give both inputs on the command line of cat. (because I don't want to append to what's already in foo) – Nordic Mainframe Feb 01 '11 at 16:43
  • 1
    ah yes, my mistake. I didn't spot the 0 there in the name properly on my first read through – Flexo Feb 01 '11 at 16:59
  • 1
    +1 Wonderful, especially when paired with [miniz](https://code.google.com/p/miniz/) – mvp Jan 07 '14 at 19:11
  • This will produce an invalid binary (at least on Mac and Linux), which cannot be processed by tools like `install_name_tool`. Beside that, the binary still works as executable. – Andy Li Apr 13 '18 at 17:07
41

If you want control over the exact symbol name and placement of resources, you can use (or script) the GNU assembler (not really part of gcc) to import whole binary files. Try this:

Assembly (x86/arm):

thing.s

    .section .rodata

    .global thing
    .type   thing, @object
    .balign 4
thing:
    .incbin "meh.bin"
thing_end:

    .global thing_size
    .type   thing_size, @object
    .balign 4
thing_size:
    .int    thing_end - thing

C:

main.c

#include <stdio.h>

extern const char thing[];
extern const unsigned thing_size;

int main() {
  printf("%p %u\n", thing, thing_size);
  return 0;
}

You can compile this simply with gcc main.c thing.s.

Whatever you use, it's probably best to make a script to generate all the resources, and have nice/uniform symbol names for everything.

Depending on your data and the system specifics, you might need to use different alignment values (preferably with .balign for portability), or integer types of a different size for thing_size, or a different element type for the thing[] array.

Jack M
  • 4,769
  • 6
  • 43
  • 67
John Ripley
  • 4,434
  • 1
  • 21
  • 17
  • thanks for sharing! definitely looks interesting, but this time it's not what i'm looking for =) regards – Atmocreations Feb 06 '11 at 09:36
  • 2
    Exactly what I was looking for. Maybe you can verify that it's also ok for files with sizes not devisible by 4. Looks like thing_size will include the extra padding bytes. – Pavel P Jun 07 '12 at 04:24
  • What if I want thing to be a local symbol? I can probably cat the compiler output together with my own assembly but is there a better way? – user877329 Jan 21 '14 at 17:25
  • For the record: My edit adresses the issue of the extra padding bytes @Pavel noted. – ndim Feb 02 '17 at 22:40
38

From http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967:

I recently had the need to embed a file in an executable. Since I'm working at the command line with gcc, et al and not with a fancy RAD tool that makes it all happen magically it wasn't immediately obvious to me how to make this happen. A bit of searching on the net found a hack to essentially cat it onto the end of the executable and then decipher where it was based on a bunch of information I didn't want to know about. Seemed like there ought to be a better way...

And there is, it's objcopy to the rescue. objcopy converts object files or executables from one format to another. One of the formats it understands is "binary", which is basicly any file that's not in one of the other formats that it understands. So you've probably envisioned the idea: convert the file that we want to embed into an object file, then it can simply be linked in with the rest of our code.

Let's say we have a file name data.txt that we want to embed in our executable:

# cat data.txt
Hello world

To convert this into an object file that we can link with our program we just use objcopy to produce a ".o" file:

# objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 data.txt data.o

This tells objcopy that our input file is in the "binary" format, that our output file should be in the "elf32-i386" format (object files on the x86). The --binary-architecture option tells objcopy that the output file is meant to "run" on an x86. This is needed so that ld will accept the file for linking with other files for the x86. One would think that specifying the output format as "elf32-i386" would imply this, but it does not.

Now that we have an object file we only need to include it when we run the linker:

# gcc main.c data.o

When we run the result we get the prayed for output:

# ./a.out
Hello world

Of course, I haven't told the whole story yet, nor shown you main.c. When objcopy does the above conversion it adds some "linker" symbols to the converted object file:

_binary_data_txt_start
_binary_data_txt_end

After linking, these symbols specify the start and end of the embedded file. The symbol names are formed by prepending binary and appending _start or _end to the file name. If the file name contains any characters that would be invalid in a symbol name they are converted to underscores (eg data.txt becomes data_txt). If you get unresolved names when linking using these symbols, do a hexdump -C on the object file and look at the end of the dump for the names that objcopy chose.

The code to actually use the embedded file should now be reasonably obvious:

#include <stdio.h>

extern char _binary_data_txt_start;
extern char _binary_data_txt_end;

main()
{
    char*  p = &_binary_data_txt_start;

    while ( p != &_binary_data_txt_end ) putchar(*p++);
}

One important and subtle thing to note is that the symbols added to the object file aren't "variables". They don't contain any data, rather, their address is their value. I declare them as type char because it's convenient for this example: the embedded data is character data. However, you could declare them as anything, as int if the data is an array of integers, or as struct foo_bar_t if the data were any array of foo bars. If the embedded data is not uniform, then char is probably the most convenient: take its address and cast the pointer to the proper type as you traverse the data.

Martin Scharrer
  • 1,424
  • 1
  • 18
  • 35
Hazok
  • 5,373
  • 4
  • 38
  • 48
5

Reading all post here and in Internet I have made a conclusion that there is no tool for resources, which is :

1) Easy to use in code.

2) Automated (to be easy included in cmake/make).

3) Cross-platform.

I have decided to write the tool by myself. The code is available here. https://github.com/orex/cpp_rsc

To use it with cmake is very easy.

You should add to your CMakeLists.txt file such code.

file(DOWNLOAD https://raw.github.com/orex/cpp_rsc/master/cmake/modules/cpp_resource.cmake ${CMAKE_BINARY_DIR}/cmake/modules/cpp_resource.cmake) 

set(CMAKE_MODULE_PATH ${CMAKE_BINARY_DIR}/cmake/modules)

include(cpp_resource)

find_resource_compiler()
add_resource(pt_rsc) #Add target pt_rsc
link_resource_file(pt_rsc FILE <file_name1> VARIABLE <variable_name1> [TEXT]) #Adds resource files
link_resource_file(pt_rsc FILE <file_name2> VARIABLE <variable_name2> [TEXT])

...

#Get file to link and "resource.h" folder
#Unfortunately it is not possible with CMake add custom target in add_executable files list.
get_property(RSC_CPP_FILE TARGET pt_rsc PROPERTY _AR_SRC_FILE)
get_property(RSC_H_DIR TARGET pt_rsc PROPERTY _AR_H_DIR)

add_executable(<your_executable> <your_source_files> ${RSC_CPP_FILE})

The real example, using the approach can be downloaded here, https://bitbucket.org/orex/periodic_table

dns
  • 2,753
  • 1
  • 26
  • 33
user2794512
  • 51
  • 1
  • 1
  • 1
    I think your answer needs better explanation to become useful for more people. – kyb Oct 26 '17 at 16:17