89

I'm looking for a way to easily embed any external binary data in a C/C++ application compiled by GCC.

A good example of what I'd like to do is handling shader code - I can just keep it in source files like const char* shader = "source here"; but that's extremely impractical.

I'd like the compiler to do it for me: upon compilation (linking stage), read file "foo.bar" and link its content to my program, so that I'd be able to access the contents as binary data from the code.

Could be useful for small applications which I'd like to distribute as a single .exe file.

Does GCC support something like this?

jww
  • 97,681
  • 90
  • 411
  • 885
Kos
  • 70,399
  • 25
  • 169
  • 233
  • Possible duplicate of [C/C++ with GCC: Statically add resource files to executable/library](https://stackoverflow.com/questions/4864866/c-c-with-gcc-statically-add-resource-files-to-executable-library) – Ciro Santilli OurBigBook.com Nov 19 '18 at 10:53

6 Answers6

89

There are a couple possibilities:


Update: Here's a more complete example of how to use data bound into the executable using ld -r -b binary:

#include <stdio.h>

// a file named foo.bar with some example text is 'imported' into 
// an object file using the following command:
//
//      ld -r -b binary -o foo.bar.o foo.bar
//
// That creates an bject file named "foo.bar.o" with the following 
// symbols:
//
//      _binary_foo_bar_start
//      _binary_foo_bar_end
//      _binary_foo_bar_size
//
// Note that the symbols are addresses (so for example, to get the 
// size value, you have to get the address of the _binary_foo_bar_size
// symbol).
//
// In my example, foo.bar is a simple text file, and this program will
// dump the contents of that file which has been linked in by specifying
// foo.bar.o as an object file input to the linker when the progrma is built

extern char _binary_foo_bar_start[];
extern char _binary_foo_bar_end[];

int main(void)
{
    printf( "address of start: %p\n", &_binary_foo_bar_start);
    printf( "address of end: %p\n", &_binary_foo_bar_end);

    for (char* p = _binary_foo_bar_start; p != _binary_foo_bar_end; ++p) {
        putchar( *p);
    }

    return 0;
}

Update 2 - Getting the resource size: I could not read the _binary_foo_bar_size correctly. At runtime, gdb shows me the right size of the text resource by using display (unsigned int)&_binary_foo_bar_size. But assigning this to a variable gave always a wrong value. I could solve this issue the following way:

unsigned int iSize =  (unsigned int)(&_binary_foo_bar_end - &_binary_foo_bar_start)

It is a workaround, but it works good and is not too ugly.

Community
  • 1
  • 1
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Shaders are not BLOB. They are normal text. – BЈовић Nov 11 '10 at 20:36
  • 3
    @VJo: then treat the blob as text. You may have to do a bit of work to make sure there's a `'\0'` at the end of the text if you need it terminated like that. Some experimenting might be in order. – Michael Burr Nov 11 '10 at 20:38
  • Thanks, Michael; looks like what I needed, but I'm receiving `objdump: foo.o: File format not recognized` error, and a similar one when trying to link that object with my source. Any hints? I'm on Windows, using tdm-mingw 4.5.1 and my ld -v yields `GNU ld (GNU Binutils) 2.20.51.20100319`. I can fallback to your second suggestion, so it's just my curiosity from now on. :) – Kos Nov 11 '10 at 21:06
  • @Kos: I've posted an example bit of code that compiles and runs on my system. I'm using the MinGW distribution from http://nuwen.net/mingw.html which has `gcc (GCC) 4.5.1`, `GNU ld (GNU Binutils) 2.20.1.20100303`, and `GNU objdump (GNU Binutils) 2.20.1.20100303`. On your system, does `objdump -i` say anything about the `binary` format? – Michael Burr Nov 11 '10 at 23:18
  • Sorry to say, but even after edit, your solution is still not good, because a shader is not a block of binary data, but a text. – BЈовић Nov 12 '10 at 07:17
  • 9
    @VJo: text _is_ binary. _Everything_ on a computer is binary. – MSalters Nov 12 '10 at 09:17
  • @Michael, you can have a look at my environment here: http://nopaste.voric.com/paste.php?f=me4dr3 . If I use the nuwen's version of `ld` to create the `foo.bar.o` file (only that - I can use tdm's build for the rest), then everything works fine. I find it somewhat suprising that we're actually getting different results here. See: http://nopaste.voric.com/paste.php?f=95zizg – Kos Nov 12 '10 at 12:10
  • 2
    @MSalters re: "text is binary". Yes, but, ... in text the EOL may be treated differently on different systems. Explicitly calling it binary prevents such foibles. – Jesse Chisholm May 20 '14 at 18:54
  • Your answer and this [here](http://stackoverflow.com/a/4865249/670521) complement each other, so I'm linking back to it, to help peolpe to have more examples. – DrBeco Jul 11 '15 at 05:37
  • @MSalters Also, binary isn't even binary. Normally, you dedicate a piece of memory as either code or data (see: `VirtualAlloc` and `mmap`); by restricting access, you can protect applications. Also because of this I have my doubts that this solution will work in all cases; basically it compiles a blob as a code block and then the code uses it as a data block... iirc that should only work if the executable/DLL block is marked as `.text` - which basically (always) marks it as a data block. – atlaste Sep 11 '15 at 13:55
  • 3
    @atlaste: What you describe is the distinction between writeable ("data") and executable ("code"). Read-only data needs neither method. – MSalters Sep 11 '15 at 14:20
  • @MSalters Executables are mmapped and then executed. If you look closely, you'll see that EXECUTE, EXECUTE_READ and READONLY are different flags. If a section in a exe/dll is marked as 'code' (EXECUTE), there's no reason to mark it as 'read-only' - which is what's used here (and visa versa). The reason this works is that it's marked as '.text' data, which maps to the correct protection flags. Putting it in a '.code' should give errors. Link for flags: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx . Linux mmap can do similar things with PROT_READ and PROT_EXEC. – atlaste Sep 11 '15 at 15:12
  • 2
    Can you tell `ld` which symbol name to generate for the data? – Calmarius Jul 20 '16 at 10:23
  • Shouldn't that be `&_binary_foo_bar_end - &_binary_foo_bar_start + 1`? The number of elements in the range `[a, b]` is `b - a + 1`. – jww Aug 30 '18 at 17:53
  • 1
    @Calmarius has been asked at: https://stackoverflow.com/questions/19169039/symbol-names-when-embedding-data-in-executable-on-linux it seems you can't, making this approach unusable in many cases. – Ciro Santilli OurBigBook.com Nov 16 '18 at 17:15
  • Doing this with x86_64-w64-mingw32 on a Linux host, &_binary_foo_bar_size was correct when running my executable through wine, but wrong when the exact same executable was run on Windows 7. end - start always works through. – repkap11 Apr 13 '19 at 04:01
  • 1
    @jww no, the end is one past the end, the subtraction works as-is – K. Brafford Sep 26 '20 at 05:13
  • I think size should be `_binary_foo_bar_end - _binary_foo_bar_start`, without `&` – balping Aug 15 '21 at 17:42
  • 1
    @balping, it doesn’t matter, *(a)* and *(&a)* evaluates to the same value in C if *a* is an array. – dened Sep 04 '22 at 12:15
  • @dened I checked it and you are right. To this day, I was convinced that `char * a` and `char a[]` are the same thing, but they are apparently not. Thank you, I learned something today. – balping Sep 05 '22 at 02:35
  • ad Update 2: _binary_foo_bar_size and ASLR somehow don't work. `extern const uint16_t _binary_foo_bar_size[];` and then `uint16_t foo_bar_size = (uint16_t) _binary_foo_bar_size;` did the trick. – Johannes Oct 25 '22 at 14:29
43

As well as the suggestions already mentioned, under linux you can use the hex dump tool xxd, which has a feature to generate a C header file:

xxd -i mybinary > myheader.h
Riot
  • 15,723
  • 4
  • 60
  • 67
  • 10
    I think this solution is the best. It is also cross platform and cross compiler support. – Behrouz.M Jul 27 '15 at 02:42
  • 5
    This is true, but it does have one drawback - the resulting header files are **much** larger than the original binary file. This has no impact on the final compiled result, but it can be undesirable as part of the build process. – Riot Jul 28 '15 at 02:35
  • 4
    this problem can be solved by using **precompiled header**. – Behrouz.M Jul 28 '15 at 11:05
24

The .incbin GAS directive can be used for this task. Here is a totally free licenced library that wraps around it:

https://github.com/graphitemaster/incbin

To recap. The incbin method is like this. You have a thing.s assembly file that you compile with gcc -c thing.s

      .section .rodata
    .global thing
    .type   thing, @object
    .align  4
thing:
    .incbin "meh.bin"
thing_end:
    .global thing_size
    .type   thing_size, @object
    .align  4
thing_size:
    .int    thing_end - thing

In your c or cpp code you can reference it with:

extern const char thing[];
extern const char* thing_end;
extern int thing_size;

So then you link the resulting .o with the rest of the compilation units. Credit where due is to @John Ripley with his answer here: C/C++ with GCC: Statically add resource files to executable/library

But the above method is not as convenient as what incbin can give you. To accomplish the above with incbin you don't need to write any assembler. Just the following will do:

#include "incbin.h"

INCBIN(thing, "meh.bin");

int main(int argc, char* argv[])
{
    // Now use thing
    printf("thing=%p\n", gThingData);
    printf("thing len=%d\n", gThingSize);   
}
hookenz
  • 36,432
  • 45
  • 177
  • 286
15

For C23, there now exists the preprocessor directive #embed, which achieves exactly what you are looking for without using external tools. See 6.10.3.1 of the C23 standard (here is a link to the most recent working draft). Here's good blog post about the history of #embed by one of the committee members behind this new feature.

Here is a snippet from the draft standard demonstrating its use:

#include <stddef.h>
void have_you_any_wool(const unsigned char*, size_t);

int main (int, char*[]) {
    static const unsigned char baa_baa[] = {
#embed "black_sheep.ico"
    };
    
    have_you_any_wool(baa_baa, sizeof(baa_baa));
    return 0;
}

An equivalent directive for C++ does not exist at this time.

irowe
  • 638
  • 11
  • 21
  • 1
    Surely `have_you_any_wool` and `black_sheep.ico` aren't actually the names used in the C23 standard though? – Miles Rout Mar 01 '23 at 06:50
  • 2
    That snippet is taken verbatim from the standard. See p169 of this pdf: https://open-std.org/JTC1/SC22/WG14/www/docs/n3088.pdf Surely we can allow the committee a sense of humor? – irowe Mar 01 '23 at 13:33
  • A sense of humour? Sure. This isn't humour, though. It's just one whose website's logo is a black sheep trying to self-insert all over the standard. – Miles Rout Mar 03 '23 at 06:38
1

If I want to embed static data into an executable, I would package it into a .lib/.a file or a header file as an array of unsigned chars. That's if you are looking for a portable approach. I have created a command line tool that does both actually here. All you have to do is list files, and pick option -l64 to output a 64bit library file along with a header that includes all pointers to each data.

You can explore more options as well.for example, this option:

>BinPack image.png -j -hx

will output the data of image.png into a header file, as hexadecimal and lines will be justified per -j option.

const unsigned char BP_icon[] = { 
0x89,0x50,0x4e,0x47,0x0d,0x0a,0x1a,0x0a,0x00,0x00,0x00,0x0d,0x49,0x48,0x44,0x52,
0x00,0x00,0x01,0xed,0x00,0x00,0x01,0xed,0x08,0x06,0x00,0x00,0x00,0x34,0xb4,0x26,
0xfb,0x00,0x00,0x02,0xf1,0x7a,0x54,0x58,0x74,0x52,0x61,0x77,0x20,0x70,0x72,0x6f,
0x66,0x69,0x6c,0x65,0x20,0x74,0x79,0x70,0x65,0x20,0x65,0x78,0x69,0x66,0x00,0x00,
0x78,0xda,0xed,0x96,0x5d,0x92,0xe3,0x2a,0x0c,0x85,0xdf,0x59,0xc5,0x2c,0x01,0x49,
0x08,0x89,0xe5,0x60,0x7e,0xaa,0xee,0x0e,0xee,0xf2,0xef,0x01,0x3b,0x9e,0x4e,0xba,
0xbb,0x6a,0xa6,0x66,0x5e,0x6e,0x55,0x4c,0x8c,0x88,0x0c,0x07,0xd0,0x27,0x93,0x84,
0xf1,0xef,0x3f,0x33,0xfc,0xc0,0x45,0xc5,0x52,0x48,0x6a,0x9e,0x4b,0xce,0x11,0x57,
0x2a,0xa9,0x70,0x45,0xc3,0xe3,0x79,0xd5,0x5d,0x53,0x4c,0xbb,0xde,0xd7,0xe8,0x57,
0x8b,0x9e,0xfd,0xe1,0x7e,0xc0,0xb0,0x02,0x2b,0xe7,0x03,0xcf,0xa7,0xa5,0x87,0xff,
0x1a,0xf0,0xb0,0x54,0xd1,0xd2,0x0f,0x42,0xde,0xae,0x07,0xc7,0xf3,0x83,0x92,0x4e,
0xcb,0xfe,0x22,0xc4,0xa7,0x91,0xb5,0xa2,0xd5,0xee,0x97,0x50,0xb9,0x84,0x84,0xcf,
0x07,0x74,0x09,0xd4,0x73,0x5b,0x31,0x17,0xb7,0x8f,0x5b,0x38,0xc6,0x69,0xaf}
The Oathman
  • 125
  • 7
-4

You could do this in a header file :

#ifndef SHADER_SRC_HPP
#define SHADER_SRC_HPP
const char* shader= "

//source

";
#endif

and just include that.

Other way is to read the shader file.

BЈовић
  • 62,405
  • 41
  • 173
  • 273
  • 4
    I think Kos wants to be able to maintain the shader source without having to worry about escaping special characters (among other possible issues). – Michael Burr Nov 11 '10 at 20:41
  • 2
    @VJo: nope - never used a shader. I was approaching the question as embedding arbitrary data residing in external files into the program. I can certainly accept that this might be a much better solution for shaders in particular. – Michael Burr Nov 11 '10 at 21:13
  • 1
    A file which defines (as opposed to declares) a global variable should not be a header file but a source module. And your type is extremely inefficient. Make it `const char shader[] = "source";` instead. – R.. GitHub STOP HELPING ICE Nov 11 '10 at 21:40
  • @R a better way is to declare an external variable in the header, and define in the source file. This is also easy to maintain – BЈовић Nov 11 '10 at 21:46
  • 9
    Also, I believe C++ doesn't allow you to have multi-line string literals in other way than either opening and closing `""` quotes in each line separately or having a backslash at the end of every line. Not to mention the other benefits of having the shader available as a standalone file during development (syntax coloring, at the very least?). – Kos Nov 11 '10 at 22:44
  • @Kos You could put quotes at the beginning and end of each line, but that is hardly easier, but perhaps more clear. At least it is more clear to people familiar with compile time string literal concatenation. – Sqeaky Sep 02 '15 at 07:25
  • 1
    Since C++11 you can use a "raw string literal", it looks like `R"*( ... multiline text ... )*"`. You can use another delimiter instead of *. – Zeno Rogue Mar 16 '21 at 22:34