9

There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:

  • You have an already compiled executable (obviously expecting the use of this technique).
  • You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
  • You want the already compiled executable to be able to access this added binary data.

My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).

A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).

Using existing questions a particular path of solution shows along the following examples:

appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).

Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).

There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)

Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.

Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.

Additional notes:

The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:

  • Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
  • The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
Community
  • 1
  • 1
Jubatian
  • 2,171
  • 16
  • 22
  • dynamic link libraries? – Peter Miehle Sep 07 '15 at 11:42
  • 4
    AFAIC zip programs do this all the time when they create self extracting archives. I'd take a look at how 7-zip does it. Cross platform will be a problem, since executables are not shareable between platforms. – Mathias Sep 07 '15 at 11:46
  • In Windows, you could simply set it as Resources: It can be done programmatically, see `BeginUpdateResource` and related functions. – Medinoc Sep 07 '15 at 11:52
  • @Mathias is spot on. The PE executable file header (for Windows only) contains all information to get the actual (logical) length of an executable. Add anything you want after this. Then you can seek to this position and read your private data. Cross platform is not going to work because other platforms don't use PE headers and I don't know if, say, the ELF executable format header contains similar metadata. – Jongware Sep 07 '15 at 11:57
  • @Mathias: Good idea! I forgot that it is even open source and available for Linux. I specifically mentioned "from maintenance standpoint" since the result is obviously not cross platform due to the exe differences, but the source of the thing could be (that is, not using techniques which would only work on one or another). – Jubatian Sep 07 '15 at 12:10
  • 1
    @Jongware : I checked ELF, and appending is described to work with it. It is not really necessary to interpret a PE header to achieve reading it: you could use a magic value to detect the start of it, or have a descriptor at the very end of the file containing the size, so it will work with any executable format which can be padded with arbitrary data (I had seen these two techniques described in various blogs). – Jubatian Sep 07 '15 at 12:15
  • possible duplicate of [SDL embed image inside program executable](http://stackoverflow.com/questions/18422123/sdl-embed-image-inside-program-executable) – Cloud Sep 07 '15 at 12:38
  • @Dogbert : That's a completely different problem there: it is a compile time solution. My question assumes an executable which you can't recompile (you may not even have the compiler on the machine which is supposed to do it) when you want to add the binary data to it (Think about Mathias's comment, a self-extracting archive's construction). – Jubatian Sep 07 '15 at 12:43
  • Create a small header that exports one symbol which is a pointer to your data, then use the linker to add it to your object file. – stark Sep 07 '15 at 12:51
  • 1
    if the compiled program is not expecting the additional data, then that data will never be accessed. – user3629249 Sep 07 '15 at 14:07
  • @Jubatian How do you plan to use the data from within the program then? This is just simple steganography if you're just trying to store the data in the program without making the program aware of the extra data. – Cloud Sep 07 '15 at 16:16
  • 2
    @Dogbert The executable could open and parse itself to find the embedded data, as described in the question. – melpomene Sep 07 '15 at 16:35
  • @melpomene : that, or the executable may be compiled to contain a stub which is later replaced to the intended piece of data (I would prefer to achieve the latter to avoid the likely hassle of getting the exe file opened, which seems to involve querying and passing a path to a file open function which is even a potential attack surface as far as I see). – Jubatian Sep 07 '15 at 18:35
  • 1
    Your perfect conditions are inconsistent. Either you load data when loading your program or you need to open executable later. In the first case you need linker to build executable - potentially very simple linker, but you need to parse and rebuild elf – zch Sep 07 '15 at 20:41
  • @zch : "Your perfect conditions are inconsistent" - I don't understand what this addresses, despite re-reading the whole thing. Yes, the thing may be called a "linker". I pretty much think I will need to roll my own solution (unless 7-zip or some other utility has something of this type). Problems to solve are building the executable to be later patched in such a manner that the stub of the user supplied data in it needs the least complexity to provide (such as being the last symbol, so nothing beyond needs to be relocated etc), so minimizing format dependency. – Jubatian Sep 08 '15 at 19:17
  • I fail to grasp what the heck is so alien in this problem that hardly anybody even manages to see what the problem actually is... (and even voting for closing it as a dupe of something only vaguely related) – Jubatian Sep 08 '15 at 19:24
  • You said that you don't want to depend on compiler tools. I point out that you need a linker. You can write your own elf parser and builder, but that would increase complexity of your solution, especially that you target multiple platforms. – zch Sep 09 '15 at 10:26
  • @zch : It is not necessarily _that_ complex. I already did some preliminary experiments. A simple program of a few objects, one of which was the data which was supposed to be replaced later. I compiled, linked the stuff with a few "larger" sizes (obviously only the data had to be recompiled). The resulting binaries if the data chunk is cut from them are very similar, so some basic delta-compression might cover them all. So a probable, limited, but quite cross platform / cross format solution shows... – Jubatian Sep 09 '15 at 17:39
  • @zch : So essentially the "builder" stores versions for several data sizes of the precompiled executable, it is just that thanks to the delta-compression it is not much noticable. Not a linker, rather a kind of brute-force approach, but should work for any executable format as long as the data chunk in it is linear. The drawbacks are of course obvious, however the advantage is a more robust end result (not relying on the ability of getting the exe's path, everything can be sourced from standard C). – Jubatian Sep 09 '15 at 17:44

1 Answers1

1

It depends on how you want other systems to see your binary.

Digital signed in Windows

The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-

  1. Compile your file
  2. Add your data packet
  3. Sign your file and publish it.

The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.

The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.

It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog

Breaking the signature

If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.

Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.

This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.

Running module

Windows GetModuleFileName allows the running path to be found.

Linux offers /proc/self or /proc/pid.

Unix does not seem to have a method which is reliable.

Data reading

The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.

caesay
  • 16,932
  • 15
  • 95
  • 160
mksteve
  • 12,614
  • 3
  • 28
  • 50
  • I up-voted, but still there is a way to go here! For example it is possible to build such a Windows executable which has a modifiable data portion, yet it is signed, here: http://reboot.pro/topic/15889-modify-a-signed-executable-without-invalidating-its-digital-signature/ (originally I found it applied to especially our purpose, I just can't find that article now). A few words on the general data append concept would also have been nice (such as the mention of ELF). I will also explore a different approach, see comments under question. – Jubatian Sep 14 '15 at 15:57
  • Tried to update, and was aware of signed unchecked data change. (Msdn blog). Don't know much of elf innards – mksteve Sep 14 '15 at 19:13