2

I use 'ld -r -b binary -o binary.o foo.jpeg' to embed resources in my program. Works awesomely. I just wonder why the int _binary_size symbol never reads correctly, negative or too large a number, but stays the same between program runs. I always gotta do _binary_end - _binary_start, which works flawlessly. It's seems it works for no one... like here .... why is that?

There is no reason not to use end-start as it replaces the size symbol, but it still leaves me curious.

edit: code example.

extern const unsigned char _binary_scna4_jpg_start;
extern const unsigned char _binary_scna4_jpg_end;
extern const int _binary_scna4_jpg_size;

int size = &_binary_scna4_jpg_end - &_binary_scna4_jpg_start;
printf("Size is %d vs %d \n", size, _binary_scna4_jpg_size);

this prints:

Size is 1192071 vs -385906356 

First number is the correct size of the binary and all my images read flawlessly.

Output of nm for good measure:

0000000000123087 D _binary_scna4_jpg_end
0000000000123087 A _binary_scna4_jpg_size
0000000000000000 D _binary_scna4_jpg_start
FrostKiwi
  • 741
  • 1
  • 6
  • 16

2 Answers2

4

The problem arises because of Position-Independent Executables (PIE). Earlier executables were loaded at the same memory addresses (which were determined at compile/link time) which led to possible attacks because the attacker knew at which address specific parts of programs were. Therefore Address Space Layout Randomization was implemented. This has the side effect that the size symbols being defined as absolute addresses (the _binary_scna4_jpg_size is not an integer value, it's a "pointer" just like _start and _end) also get relocated when they are loaded.

If you compile your code with option -no-pie you can disable position-independence and the _binary_scna4_jpg_size will output the correct value since it will not be relocated. Since PIE is on by default these days the value of the pointer is basically garbage. You could also use it if you knew the beginning of the relocated memory, but since you already have _binary_scna4_jpg_start and _binary_scna4_jpg_end it's the same thing to use them.

Sami Kuhmonen
  • 30,146
  • 9
  • 61
  • 74
1

Your _binary_scna4_jpg_size symbol is not an integer. It's an absolute address symbol. In order to get the size, you would need to take the address of it and cast to an appropriate integral type:

printf("The real size is %td\n", (ptrdiff_t) &_binary_scna4_jpg_size);

This however only works when disabling PIE (gcc -fPIC -no-pie) or linking statically (gcc -static).

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • It works, but I don't get it. It seems that C semantics works very differently with this special extern symbol: `&` operator here doesn't tell the address of a variable in memory but the value itself. Also which declaration should `_binary__size` have? It can be `size_t`, it can be `int`, it can be `size_t*` or `void*`...all will works. So, which declaration is more appropriate? If you can improve the answer with more explanations I would be very happy. – ceztko Feb 04 '21 at 14:46
  • @ceztko `foo.jpeg` is not a C file, and whatever `ld` makes of it has nothing to do with C. `_binary_scna4_jpg_size` is not a variable that holds the size. It is a symbol which `ld` *places at the address* that represents the size. There is no variable at that address, just the symbol. – n. m. could be an AI Feb 04 '21 at 21:50
  • I was just refering to `_size` symbol. I was thinking on the assumption that `&` operator was working just on variables, and the returned value was always a valid address. As in this case the returned value is a size, that can't be a valid address at all. Never mind, it's probably just a strange semantics that applies to such symbols. If you have a suggestion which type I should declare the symbol I would be glad, since it's not a variable and most types makes not much sense to me (and I can't declare it just `void`). – ceztko Feb 04 '21 at 22:35
  • @ceztko `_size` symbol is *not* a variable as far as C is concerned. It is not *defined* in any C source file. It is only *declared* as `extern`. It is defined by something that is not C code. The C language does not specify what should happen in such a case. And of course this is not a valid address as far as the C program is concerned, because it is not an address of any object. You can declare it however you want, the type doesn't matter as long as it is an object type (so no `void`). It's OK since you are not accessing the (non-existent) object you declare, you are just taking its address. – n. m. could be an AI Feb 04 '21 at 22:50
  • Ok! This is just a digression, but I would not say "just taking its address". You take the address of something that can be addressed. In my reasoning, the & operator here is just the only provided why to access that value, no matter that & is most often used to take address of variables and other stuff, such as functions. – ceztko Feb 04 '21 at 23:10
  • 1
    `&` takes the address of an object. It doesn't take the value of said object. Suppose the size of the data is 100. The linker creates a `_size` symbol at address 100. There is no object and no value at that address, but the linker tricks the compiler into believing there is. So the compiler creates an instruction to take an address of a non-existing object at address 100. If you like to think of it as the linker placing a magic value that is accessed by the compiler with a magic `&`, you can, but in reality this is just a regular symbol like any other, and a regular `&`. – n. m. could be an AI Feb 04 '21 at 23:24
  • With some effort (because of lacking experience on how stuff are stored at the binary level) I was able to follow you, so thanks for pushing: `_size` is a regular symbol and the linker will write it in the correct binary object section. `&` will "take the address of the symbol", which ends being a read of the "address" in the relevant binary section. It doesn't matter it's not a valid address and that the linker "abused" of this section to write a constant value to save memory: it works and for the C language it's just fine. Possibly other implementations for resource embedding are also valid. – ceztko Feb 05 '21 at 07:19
  • I'm adding a last comment on your answer: you casted the symbol address to `ptrdiff_t`, but `ptrdiff_t` is usually used for pointers arithmetic. The actual type is is implementation defined but it is usually a signed value. If it is signed, on a 32 bit architecture you would end not being able to embed files bigger than 2GB, because the size would be read incorrectly. I would rather cast the address to `size_t`, unless you know a reason why this would be incorrect. – ceztko Feb 05 '21 at 07:29