4

We have a large set of C++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. Then we compile an executable using these libraries and deploy the binary on the front-end. It would be extremely useful to be able to identify that binary. Ideally what we would like to have is a small script that would retrieve the following information directly from the binary:

$ident binary
$binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx...
$  dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx...
$  dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx...

So it should display all the information for the binary itself and for all of its dependencies.

Currently our approach is:

  1. During compilation for each product we generate Manifest.h and Manifest.cpp and then inject Manifest.o into binary

  2. ident script parses target binary, finds generated stuff there and prints this information

However this approach is not always reliable for different versions of gcc.. I would like to ask SO community - is there better approach to solve this problem?

Thanks for any advice

nogard
  • 9,432
  • 6
  • 33
  • 53
  • We are doing something similar, but we have added some APIs to allow registering the version information at a central place, so that we can retrieve it not only through ident but also through some API calls to show it in the application itself. So, in general I would say your approach is reasonable ;) What are the exact issues you are observing? – Andreas Fester Sep 24 '12 at 09:02
  • @Andreas. Thanks. Problem happens only on one platform (Linux x86, gcc 4.1.2) - for some reason in the compiled binary manifest is not present (maybe optimized out since no references, or some tricky mangling). We have a workaround on this (we compile Manifest.o with ancient compiler), but I have a feeling that we do kind of hacks. – nogard Sep 24 '12 at 09:14
  • 1
    Can you just add a command line argument that causes the executable to dump the version information to stdout for the script to grab? – Jonathan Potter Sep 26 '12 at 10:45
  • @JonathanPotter: Thanks, but that's the main problem how to put version information (with all deps) into executable. How do you propose to handle this? Yes, 'command line approach' simplifies output of the information, but works only for executables and requires to insert the same code into each executable which is more intrusive – nogard Sep 27 '12 at 06:14
  • Presumably the linked-in dependencies are known and don't change that often, so simply have each dependency export a function that provides its version number, and have the exe call all of those and string them together. – Jonathan Potter Sep 27 '12 at 09:02
  • @JonathanPotter: Yes, I know that this way it would work, but it requires code changes in all the components - this is exactly what we try to avoid. The point is how to automate this process, so that version information would be injected autimatically. Currently we automate it by using common makefile that injects Manifest.o, so doesn't require any code changes. However it is not always reliable.. – nogard Sep 27 '12 at 09:47
  • Have a look at `libbfd` or one of its front-ends in `binutils`. See also http://stackoverflow.com/questions/4864866 and http://stackoverflow.com/questions/1997172 – 0xC0000022L Sep 28 '12 at 07:02

3 Answers3

5

One of the catches with storing data in source code (your Manifest.h and .cpp) is the size limit for literal data, which is dependent on the compiler.

My suggestion is to use ld. It allows you to store arbitrary binary data in your ELF file (so does objcopy). If you prefer to write your own solution, have a look at libbfd.

Let us say we have a hello.cpp containing the usual C++ "Hello world" example. Now we have the following make file (GNUmakefile):

hello: hello.o hello.om
    $(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $@

%.om: %.manifest
    ld -b binary -o $@ $<

%.manifest:
    echo "$@" > $@

What I'm doing here is to separate out the linking stage, because I want the manifest (after conversion to ELF object format) linked into the binary as well. Since I am using suffix rules this is one way to go, others are certainly possible, including a better naming scheme for the manifests where they also end up as .o files and GNU make can figure out how to create those. Here I'm being explicit about the recipe. So we have .om files, which are the manifests (arbitrary binary data), created from .manifest files. The recipe states to convert the binary input into an ELF object. The recipe for creating the .manifest itself simply pipes a string into the file.

Obviously the tricky part in your case isn't storing the manifest data, but rather generating it. And frankly I know too little about your build system to even attempt to suggest a recipe for the .manifest generation.

Whatever you throw into your .manifest file should probably be some structured text that can be interpreted by the script you mention or that can even be output by the binary itself if you implement a command line switch (and disregard .so files and .so files hacked into behaving like ordinary executables when run from the shell).

The above make file doesn't take into account the dependencies - or rather it doesn't help you create the dependency list in any way. You can probably coerce GNU make into helping you with that if you express your dependencies clearly for each goal (i.e. the static libraries etc). But it may not be worth it to take that route ...

Also look at:


If you want particular names for the symbols generated from the data (in your case the manifest), you need to use a slightly different route and use the method described by John Ripley here.

How to access the symbols? Easy. Declare them as external (C linkage!) data and then use them:

#include <cstdio>

extern "C" char _binary_hello_manifest_start;
extern "C" char _binary_hello_manifest_end;

int main(int argc, char** argv)
{
        const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start;
        printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start);
}

The symbols are the exact characters/bytes. You could also declare them as char[], but it would result in problems down the road. E.g. for the printf call.

The reason I am calculating the size myself is because a.) I don't know whether the buffer is guaranteed to be zero-terminated and b.) I didn't find any documentation on interfacing with the *_size variable.

Side-note: the * in the format string tells printf that it should read the length of the string from the argument and then pick the next argument as the string to print out.

Community
  • 1
  • 1
0xC0000022L
  • 20,597
  • 9
  • 86
  • 152
  • Thank you very much for your answer. I will investigate and try this approach, looks very promising. Generating manifest is not a problem, since we already have common makefile that collect all the information – nogard Sep 28 '12 at 08:42
  • I generated executable as you proposed, I can see with objdump that _binary_src_manifest_start and _binary_src_manifest_end are present, but how can I retrieve the string data from it? – nogard Sep 28 '12 at 10:41
  • @nogard: didn't see your second question, sorry. Will answer it now. – 0xC0000022L Oct 02 '12 at 13:13
2

You can insert any data you like into a .comment section in your output binary. You can do this with the linker after the fact, but it's probably easier to place it in your C++ code like this:

 asm  (".section .comment.manifest\n\t"
       ".string \"hello, this is a comment\"\n\t"
       ".section .text");

 int main() {
   ....

The asm statement should go outside any function, in this instance. This should work as long as your compiler puts normal functions in the .text section. If it doesn't then you should make the obvious substitution.

The linker should gather all the .comment.manifest sections into one blob in the final binary. You can extract them from any .o or executable with this:

objdump -j .comment.manfest -s example.o
ams
  • 24,923
  • 4
  • 54
  • 75
  • Thanks, very interesting approach, I'll evaluate it! – nogard Sep 28 '12 at 09:37
  • I did very simple test and it works for executable itself, but it seems it does not gather manifest from the static library (I can see .comment.manifest in static library object files, but this info is not present in final executable). Is there some special linker option for that? – nogard Sep 28 '12 at 10:07
  • Hum, I thought it would do that, but I see it doesn't use a wildcard with .comment sections. You can try it again without the `.manifest` suffix, but it'll get mixed up with the comments the compiler gives. Or your can add your own line to the linker script like `.comment.manifest 0 : { *(.comment.manifest) }` – ams Sep 28 '12 at 10:17
  • Without .manifest suffix it's still the same (not gathered from dependent libs). I'll try the trick with linker later.. – nogard Sep 28 '12 at 10:43
  • It always used to be that an executable had a whole collection of repeated comments included in it (one from each object file). Maybe they fixed that when I wasn't looking. The right linker script fu can definitely fix that, but I hoped there would be an existing link rule that could be reused. – ams Sep 28 '12 at 11:13
0

Have you thought about using standard packaging system of your distro? In our company we have thousands of packages and hundreds of them are automatically deployed every day.

We are using debian packages that contain all the neccessary information:

  • Full changelog that includes:
    • authors;
    • versions;
    • short descriptions and timestamps of changes.
  • Dependency information:
    • a list of all packages that must be installed for the current one to work correctly.
  • Installation scripts that set up environment for a package.

I think you may not need to create manifests in your own way as soon as ready solution already exists. You can have a look at debian package HowTo here.

Rride.a
  • 51
  • 8
  • A package can always contain multiple binaries. Far from perfect. In short: product version != component versions. – 0xC0000022L Sep 28 '12 at 06:50
  • There no need to create a single package for the whole product. You can create a package for evey component of the system as we do. – Rride.a Sep 28 '12 at 07:06
  • Even if all your components are built through a single **make** in a single directory you can split them into different packages with the use of dh_install – Rride.a Sep 28 '12 at 07:08
  • You *may* indeed choose to split it into various packages, if that is your decision to make. The decision isn't always ours, though. Besides I think you are dodging the OP's question with the answer :) ... a borked system, not having the same package management everywhere, not having package management at all (embedded stuff) all these are valid reasons to store the mentioned information directly in the binary for later retrieval by a tool such as `strings` or something more sophisticated. – 0xC0000022L Sep 28 '12 at 07:30