55

I am trying to embed binary blobs into an exe file. I am using mingw gcc.

I make the object file like this:

ld -r -b binary -o binary.o input.txt

I then look objdump output to get the symbols:

objdump -x binary.o

And it gives symbols named:

_binary_input_txt_start
_binary_input_txt_end
_binary_input_txt_size

I then try and access them in my C program:

#include <stdlib.h>
#include <stdio.h>

extern char _binary_input_txt_start[];

int main (int argc, char *argv[])
{
    char *p;
    p = _binary_input_txt_start;

    return 0;
}

Then I compile like this:

gcc -o test.exe test.c binary.o

But I always get:

undefined reference to _binary_input_txt_start

Does anyone know what I am doing wrong?

myforwik
  • 703
  • 2
  • 7
  • 7

4 Answers4

40

In your C program remove the leading underscore:

#include <stdlib.h>
#include <stdio.h>

extern char binary_input_txt_start[];

int main (int argc, char *argv[])
{
    char *p;
    p = binary_input_txt_start;

    return 0;
}

C compilers often (always?) seem to prepend an underscore to extern names. I'm not entirely sure why that is - I assume that there's some truth to this wikipedia article's claim that

It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support

But it strikes me that if underscores were prepended to all externs, then you're not really partitioning the namespace very much. Anyway, that's a question for another day, and the fact is that the underscores do get added.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Wow... thanks alot. This was driving me mad. I knew it must have been something simple. I have just debugged it and noticed that it was changing to __binary_input_txt_start – myforwik Apr 13 '10 at 06:11
  • @myforwik: just in case you're interested, I've post a question asking why C does this: http://stackoverflow.com/questions/2627511/why-do-c-compilers-prepend-underscores-to-external-names – Michael Burr Apr 13 '10 at 06:47
  • @Michael: The article's claim is true. The runtimes were written in assembler, which was free to use names without underscores prepended and could thereby be assured not to clash with any symbols defined in the C code, and conversely the C code had no way to access the symbols from the asm runtime code. – R.. GitHub STOP HELPING ICE Aug 06 '11 at 23:24
  • 1
    Does anyone know how much data that can be embedded that way? – user877329 Mar 15 '12 at 09:18
  • The underscore stands for a reserved name doesn't it; I assume it's to avoid a clash with a hand-written code references – Will03uk Apr 10 '12 at 11:25
  • On the contrary, in my case, I _do_ have to add the underscore. Meaning, the initial code in the question works for me. Uname "Linux aditya-lucid 2.6.32-40-generic #87-Ubuntu SMP Mon Mar 5 20:26:31 UTC 2012 i686 GNU/Linux", using "gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)" – aditya Jan 09 '14 at 11:56
  • 1
    @aditya: perhaps there's a difference in that detail that depends on the target? Windows toolchains have tendency to automatically add underscores to external names when targeting Win32 x86. I wouldn't be surprised if that doesn't happen for other targets (even Win32 x64). – Michael Burr Jan 09 '14 at 18:24
  • @MichaelBurr : Hmm...interesting topic anyway, and useful as well...much to learn :) – aditya Jan 10 '14 at 05:11
9

From ld man page:

--leading-underscore

--no-leading-underscore

For most targets default symbol-prefix is an underscore and is defined in target's description. By this option it is possible to disable/enable the default underscore symbol-prefix.

so

ld -r -b binary -o binary.o input.txt --leading-underscore

should be solution.

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Matěj Pokorný
  • 16,977
  • 5
  • 39
  • 48
6

I tested it in Linux (Ubuntu 10.10).

  1. Resouce file:
    input.txt

  2. gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 [generates ELF executable, for Linux]
    Generates symbol _binary__input_txt_start.
    Accepts symbol _binary__input_txt_start (with underline).

  3. i586-mingw32msvc-gcc (GCC) 4.2.1-sjlj (mingw32-2) [generates PE executable, for Windows]
    Generates symbol _binary__input_txt_start.
    Accepts symbol binary__input_txt_start (without underline).

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
user1742529
  • 260
  • 4
  • 16
0

Apparently this feature is not present in OSX's ld, so you have to do it totally differently with a custom gcc flag that they added, and you can't reference the data directly, but must do some runtime initialization to get the address.

So it might be more portable to make yourself an assembler source file which includes the binary at build time, a la this answer.

Community
  • 1
  • 1
Josh Grams
  • 159
  • 4