15

I have a binary file which i want to embed directly into my source code, so it will be compiled into the .exe file directly, instead of reading it from a file, so the data would already be in the memory when i launch the program.

How do i do this?

Only idea i got was to encode my binary data into base64, put it in a string variable and then decode it back to raw binary data, but this is tricky method which will cause pointless memory allocating. Also, i would like to store the data in the .exe as compact as the original data was.

Edit: The reason i thought of using base64 was because i wanted to make the source code files as small as possible too.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Rookie
  • 1,242
  • 3
  • 17
  • 22
  • 1
    As long as you put this resource in a separate source file I offhand see no reason to have source size be part of the concern. Make it easy to use and obvious what's going on first, and let the compiler worry about reading a few extra characters. – Mark B Apr 19 '11 at 14:56
  • well its just my preferences really, sure it doesnt matter, but i like compact. – Rookie Apr 19 '11 at 15:01
  • Since you like compact: https://stackoverflow.com/a/52843063/6846474 I wrote a tool that compiles header files with a list of resource paths directly to object files or static libraries. –  Oct 16 '18 at 19:56
  • For GCC: https://stackoverflow.com/questions/4158900/embedding-resources-in-executable-using-gcc – Ciro Santilli OurBigBook.com Feb 08 '19 at 22:25

4 Answers4

11

The easiest and most portable way would be to write a small program which converts the data to a C++ source, then compile that and link it into your program. This generated file might look something like:

unsigned char rawData[] =
{
    0x12, 0x34, // ...
};
James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • I had to to this for firmware updates on system which does not support file operations and we just copied the raw data into array as in this answer. – dubnde Apr 19 '11 at 14:44
  • 1
    what is the most compact way doing this in my source code? i could optimize the space by not using 0x prefix and use decimal values, but are there other ways? i have seen code like: `Y\377\322\217^\377\321\227l\377\340\262\220\377` but i dont understand how that works, and it causes some compiler warnings for some reason, yet, it works. – Rookie Apr 19 '11 at 14:46
  • @Rookie: the \nnn notation uses octal to specify the value of each character. \377 is the same as 0xff. – Ferruccio Apr 19 '11 at 15:03
  • yes but what does the weird letters do in that octal data? for example there is `Y` and `^` and `l` etc, many weird chars there i dont understand the logic. – Rookie Apr 19 '11 at 15:07
  • @Rookie Presumably, not all of the characters are octal escapes. Personally, I wouldn't worry too much about the size of the source code file; if you run into size problems, it will be because the total table is too big for the compiler, and that will be after tokenization, and won't depend on the size of the input file. – James Kanze Apr 19 '11 at 15:49
  • do you know any webpage where i can read about that \nnn notation? (that with those weird chars mixed in it) i would like to figure out how to use it. but i dont know what is it called as. – Rookie Apr 19 '11 at 16:27
  • It's the basic, standard octal character escapes. I first learned in in K&R 1, back in the days before web pages, but any good C or C++ text should mention it. And there aren't any "wierd" characters in it; the octal escape just isn't being used for characters for which it isn't needed. (Actually, you've almost certainly already used it. `'\0'`, for example, is an octal escape sequence.) – James Kanze Apr 19 '11 at 17:24
  • where are the specs on which chars have to be converted to escaped octal sequences and which not? – Rookie Apr 19 '11 at 22:28
  • It's more or less up to you, but I'd guess that values which don't correspond to an encoding of a character in the C++ basic character set get converted to octal, and characters which are in the basic character set don't. – James Kanze Apr 20 '11 at 10:48
  • The binutils utility "objcopy" can transform a file in to a .o file that defines such an array and symbols for its start, end and size. Using objcopy you don't need a .c file with a big array declaration in it. This could be an alternate answer but I don't wish to step on toes. – cardiff space man Dec 03 '13 at 20:45
  • @cardiffspaceman If it's binutils, I'm not sure that it's portable, but it certainly sounds like something useful. (In the past, I've blown up the compiler with such arrays. Generating object code directly would have been a good solution.) – James Kanze Dec 04 '13 at 09:23
6

There are tools for this, a typical name is "bin2c". The first search result is this page.

You need to make a char array, and preferably also make it static const.

In C:

Some care might be needed since you can't have a char-typed literal, and also because generally the signedness of C's char datatype is up to the implementation.

You might want to use a format such as

static const unsigned char my_data[] = { (unsigned char) 0xfeu, (unsigned char) 0xabu, /* ... */ };

Note that each unsigned int literal is cast to unsigned char, and also the 'u' suffix that makes them unsigned.

Since this question was for C++, where you can have a char-typed literal, you might consider using a format such as this, instead:

static const char my_data[] = { '\xfe', '\xab', /* ... */ };

since this is just an array of char, you could just as well use ordinary string literal syntax. Embedding zero-bytes should be fine, as long as you don't try to treat it as a string:

static const char my_data[] = "\xfe\xdab ...";

This is the most compact solution. In fact, you could probably use that for C, too.

Eugene
  • 3,335
  • 3
  • 36
  • 44
unwind
  • 391,730
  • 64
  • 469
  • 606
  • \xff equals to 0xff ? which equals to 255, and when using comma, its the same size, but decimal can also be 0,0,0,0, or 11,11,11,11 so its 1 to 2 bytes smaller in some cases, whereas the hex is always 4 bytes. i think i go with decimals, if those are all the options here? – Rookie Apr 19 '11 at 15:05
  • could you also explain this data `Y\377\322\217^\377\321\227l\377\340\262\220\377` where you see `Y` and `^` and `l` in there among the octal values, what is the logic with those? – Rookie Apr 19 '11 at 15:08
  • The point in avoiding a literal like 0 was (for me) to be type-clean; the type of 1 is `int`. I guess the compiler will typically do bounds-checking when initializing, so it should be safe, but still. I'm not sure where the data you quote in the second comment comes from, but probably the generator decided that the byte-value was representable as a printable character and used that for brevity. – unwind Apr 19 '11 at 16:52
4

You can use resource files (.rc). Sometimes they are bad, but for Windows based application that's the usual way.

Coder
  • 3,695
  • 7
  • 27
  • 42
0

Why base64? Just store the file as it is in one char*.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 1
    i was thinking to use base64 because i also want to optimize the space used in my source code. – Rookie Apr 19 '11 at 14:48
  • 1
    @Rookie, how is tripling the amount of source code "optimizing" it? – Blindy Apr 19 '11 at 14:59
  • 1
    what do you mean tripling? base64 packs the data better in the sourcecode than using 0xff,0xff,0xff etc methods. see below: orig: `this is a testing text!!` base64: `dGhpcyBpcyBhIHRlc3RpbmcgdGV4dCEh` hexstr: `7468697320697320612074657374696E6720746578742121` decarr: `116,104,105,115,32,105,115,32,97,32,116,101,115,116,105,110,103,32,116,101,120,116,33,33` – Rookie Apr 19 '11 at 15:13
  • @Rookie, yes but you don't have to escape printable ascii characters. You can simply say `char *data="this is a testing text!!";` – Blindy Apr 19 '11 at 15:41
  • 1
    that was just an example of how much it would take space, whereas the original is the original data length visible by plain eyes here, i cant paste binary data in here... read the title again. – Rookie Apr 19 '11 at 16:22