3

I'm trying to figure out a good way to store and retrieve version information in C / C++ executables and libraries on Linux. I'm using the GCC compiler for my C and C++ programs.

The storage part is pretty straightforward; declaring a variable like this stores it in the .rodata section of the output file:

const char MY_VERSION[] = "some_version_information";

However, I'm having an incredibly difficult time with retrieving the information from an external program. With shared libraries, it is fairly easy to use dlopen and dlsym to load a library and look up a symbol, but this might not be the best way to do it, and it won't work at all for executables. Also, if possible, I would like this to work with executables and libraries built for a different architecture.

I figure that since shared libraries and executables both use the ELF format, it makes sense to use a library that knows how to read ELF files. The two I was able to find are libelf and BFD; I'm struggling to find decent documentation for each. Is there perhaps a better library to use?

This is what I have so far, using BFD:

#include <stdio.h>                                                                                                                                                                                                               [6/1356]
#include <string.h>
#include <bfd.h>

int main(int argc, char* argv[])
{
    const char *filename;
    int i;
    size_t storage;
    bfd *b = NULL;
    asymbol **symbol_table;
    long num_symbols;

    if(argc != 2) return 1; // todo: print a useful message
    else filename = argv[1];

    b = bfd_openr(filename, NULL);

    if(b == NULL){
        fprintf(stderr, "Error: failed to open %s\n", filename);
        return 1;
    }

    // make sure we're opening a file that BFD understands
    if(!bfd_check_format(b, bfd_object)){
        fprintf(stderr, "Error: unrecognized format\n");
        return 1;
    }

    // how much memory is needed to store the symbol table
    storage = bfd_get_symtab_upper_bound(b);

    if(storage < 0){
        fprintf(stderr, "Error: unable to find storage bound of symbol table\n");
        return 1;
    } else if((symbol_table = malloc(storage)) == NULL){
        fprintf(stderr, "Error: failed to allocate memory for symbol table\n");
        return 1;
    } else {
        num_symbols = bfd_canonicalize_symtab(b, symbol_table);
    }

    for(i = 0; i < num_symbols; i++){
        if(strcmp(symbol_table[i]->name, "MY_VERSION") == 0){
            fprintf(stderr, "found MY_VERSION\n");

            // todo: print the string?
        }
    }

    return 0;
}

I realize that printing the string may not be very simple due to the ELF format.

Is there a straightforward way to print a string symbol that is stored in an ELF file?

millinon
  • 1,528
  • 1
  • 20
  • 31
  • Have you tried `objdump`? – kichik Sep 18 '17 at 16:53
  • I have looked into `objdump`, `readelf`, and `nm`, but I haven't found a way to have the command print a specific symbol. Can an individual symbol be specified to `objdump`? – millinon Sep 18 '17 at 16:58
  • I doubt it, but you can always `grep` or parse the output. – kichik Sep 18 '17 at 17:01
  • Is there any reason why grepping or parsing the output of objdump would be easier than using a library to perform the equivalent lookup? – millinon Sep 18 '17 at 17:53
  • Yes. These tools already exist and are battle tested. – kichik Sep 18 '17 at 17:57
  • 1
    I see your point, but considering that objdump and friends use BFD internally, and considering that there isn't an obvious way to use these tools to accomplish precisely what I'm trying to do without potentially messy grepping / parsing, I don't think that using something like objdump is necessarily better than writing a program. – millinon Sep 18 '17 at 18:11
  • @millinon if you’re open to it, utilize pyelftools. It’s easy to accomplish very specific ELF parsing/analysis tasks using its readelf script as a reference. It’s on Github – adam Jan 06 '18 at 13:12

3 Answers3

4

I figured out that I could use a custom section to store the version information, and then just dump the section to 'extract' the string.

Here's how the version information should be declared:

__attribute__((section("my_custom_version_info"))) const char MY_VERSION[] = "some.version.string";

Then, in the program using BFD, we can get the section a few different ways. We can use bfd_get_section_by_name:

asection *section = bfd_get_section_by_name(b, "my_custom_version_info");

Now that we have a handle to the section, we can load it into memory. I chose to use bfd_malloc_and_get_section (you should make sure section isn't NULL first):

bfd_byte *buf;
if(!bfd_malloc_and_get_section(b, section, &buf)){
    // error: failed to malloc or read the section
}

Now that we have the section loaded into a buffer, we can print its contents:

for(int i = 0; i < section->size && buf[i]; i++){
    printf("%c", buf[i]);
}
printf("\n");

Don't forget to free the buffer.

millinon
  • 1,528
  • 1
  • 20
  • 31
  • Better use ELF hashing to access symbols – Basile Starynkevitch Sep 18 '17 at 19:44
  • I like this better than the `nm`-based version, as it should still work even if and after the ELF has been `strip`ped. (Theoretically, you should use a section name that does _not_ start with a dot, as those that do are reserved) – Petr Skocik Sep 18 '17 at 20:22
  • 1
    I found that it does not work after the ELF has been stripped - the custom section does get removed. However, `strip` accepts a -K argument that specifies symbols that should not be stripped. I did not know about section names starting with a dot - I will update the answer, thanks! – millinon Sep 18 '17 at 20:25
  • BTW, there's this tool called `elfcat` that can, by name, dump the contents of an ELF section so you don't have to wire up the code in C. With it you should be able to do `elfcat --section-name my_custom_version_info a.out` to get the string. – Petr Skocik Sep 18 '17 at 20:25
  • @millinon I couldn't reproduce that. If I strip the ELF the `strip -s` the extra section sticks. – Petr Skocik Sep 18 '17 at 20:32
2

From inside your executable, just declare

 extern const char MY_VERSION[];

BTW, for C++ better declare extern "C" that symbol (even in the file defining it).

Then your issue is how to find a symbol MY_VERSION in some external ELF executable (the easy way could be to popen some nm process, see nm(1)). BTW, it is the same as for a function symbol (or for a data one). You could use a library such as libelf or libelfin (or the venerable libbfd) or parse the ELF format yourself (be sure to read first that wikipage)

You should learn and understand the ELF format. You need to read carefully documentation on ELF and on the x86-64 ABI. Explore existing ELF executables with objdump(1) & readelf(1). Read also elf(5). Read how symbol tables are represented, and how their hash code is computed. Of course read in details all the possible relocations. You could read Levine's book on Linkers and Loaders and Drepper's paper on How to Write Shared Libraries (both explain ELF), and also Assembler Language HowTo, and Ian Taylor's paper on gold, and ELF: better symbol lookup via DT_GNU_HASH. See also Solaris documentation e.g. on Hash Table Section and OSDEV ELF tutorial & ELF pages

You don't need any specific section (or segment).

(I've done that about 20 years ago for Sparc; it is not particularly hard)

You could also look into emacs source code, its unexec.c is writing some ELF file

BTW, ELF has some versioning info with symbols, see e.g. dlvsym(3)

You may also want to understand how execve(2) or ld-linux(8) works, what is the virtual address space of a process (see proc(5), try cat /proc/$$/maps)

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Agh! I was hoping that there would exist a solution that doesn't require grokking the ELF format, but I anticipated that this might be the case due to the fact that ELF is non-trivial. Still, I appreciate the specific resources - thank you! – millinon Sep 18 '17 at 19:49
  • Of course you need to understand ELF format. It is not that difficult. Otherwise just `popen` some `nm` command – Basile Starynkevitch Sep 18 '17 at 19:50
  • I don't understand how can you want to parse ELF without understanding the ELF format (unless you use `nm`) – Basile Starynkevitch Sep 18 '17 at 19:57
  • My goal was to find an existing tool (or library) that would accomplish the task such that I could avoid reinventing the wheel. I certainly don't dispute that parsing the ELF format requires understanding the ELF format! – millinon Sep 18 '17 at 20:04
  • `libelfin` looks elegant (but I did not use it). Did you consider using it? The examples are simple and short (e.g. [dump-syms.cc](https://github.com/aclements/libelfin/blob/master/examples/dump-syms.cc) is only 43 lines...) But you do need to understand something about ELF – Basile Starynkevitch Sep 18 '17 at 20:06
  • I had not found libelfin - I will look into it. Thanks again! – millinon Sep 18 '17 at 20:10
  • Feel free to accept my answer if it fits; but you need to spend more time in understanding ELF – Basile Starynkevitch Sep 18 '17 at 20:13
0

The traditional way of doing this is via SCCS what(1) strings. See https://pubs.opengroup.org/onlinepubs/9699919799/utilities/what.html.

user1254127
  • 118
  • 6