11

While developing a header-only library, I'd like to make sure that a given string is embedded in all binaries that use my header, even if the compiler is configured to optimize away unused constants, and the binary gets stripped.

The embedding shouldn't have any side-effects (apart from making the resulting binary a little bit bigger).

I don't know how people are going to use the headers, but

  • the headers might get included in multiple compilation units, all linked together into a single binary
  • target platforms are Linux/macOS/Windows
  • compilers will most likely be gcc/clang/MSVC

My trivial attempt amounts to:

static char frobnozzel_version_string[] = "Frobnozzel v0.1; © 2019 ACME; GPLv3";

..., but that get's easily removed during the build (since the string is nowhere actually used, it's easy prey for an optimizing compiler).

So the question is: is it possible to embed a string in any binary that includes a given header, that won't get optimized/stripped away by usual strategies to build "Release" binaries?

I'm aware, that anybody who is using the library can just (manually) remove whatever I put in, but let's assume, people just use the header "as is".


Context: the headers in question are released under the GPL, and I'd like to be able to check, if the users actually comply with the license.

umläute
  • 28,885
  • 9
  • 68
  • 122
  • Macros can be used for this. – Jazzwave06 Apr 11 '19 at 14:25
  • 6
    Use `volatile` qualifier https://en.cppreference.com/w/cpp/language/cv – Nellie Danielyan Apr 11 '19 at 14:26
  • @sturcotte06 how? – umläute Apr 11 '19 at 14:31
  • Usually, projects have a `version.h.in` which is configured from the build system with a bunch of defines such as `#define FROBNOZZEL_VERSION "@PROJECT_VERSION@"` and `#define FROBNOZZEL_LICENSE "@PROJECT_LICENSE@"` – Jazzwave06 Apr 11 '19 at 14:35
  • @NellieDanielyan declaring my char[] as `volatile` works nicely with `gcc`, but clang (7.0.1-8), seems to still remove it. – umläute Apr 11 '19 at 14:37
  • @sturcotte06 the compiler won't even see those strings if the macros are not used (by the users). how are they going to end up in the resulting binaries? – umläute Apr 11 '19 at 14:39
  • You provide a header, not a shared lib. I don't understand why you need that string in the final binary, unless you plan on mmap the lib and read the value directly from the lib's memory space. If the application that include your header does not need the value, I don't see why a compiler should emit symbols for it. – Jazzwave06 Apr 11 '19 at 14:41
  • It seems like you make users of your header pay for what they don't use, which is against the philosophy of C/C++. – Jazzwave06 Apr 11 '19 at 14:43
  • @umläute looks like clang does not consider static variable's initialization as an access. A volatile variable which was not accessed can be optimized out. – Nellie Danielyan Apr 11 '19 at 14:55
  • If you could do this from a *header-only* library that guarantees that it could not be optimized away, wouldn't you then be embedding your string in *every* compilation unit includes your header, potentially resulting in many copies of your string in the resulting binary? – jamesdlin Apr 11 '19 at 15:31
  • @jamesdlin yes. i don't see this as a big problem though. – umläute Apr 11 '19 at 15:36
  • 1
    @umläute: "i don't see this as a big problem though" If the source code for the entire project fits on one screen, then it **might** not be a big problem. If 3+ people work on a considerably large code base, it will become a huge problem. – virolino Jul 10 '19 at 12:35
  • @umläute What do you think about my solution below? – klutt Jul 10 '19 at 17:39
  • @umläute And just for clarification. I assume that you mean that you should be able to do something like `grep "" a.out` to find out if the binary contains that string or not? – klutt Jul 10 '19 at 17:44
  • @klutt yes that's the basic idea – umläute Jul 10 '19 at 19:30
  • For MSVC the best way is [The version information editor](https://learn.microsoft.com/en-us/cpp/windows/version-information-editor?view=vs-2019) – Mgetz Jul 11 '19 at 15:48
  • @Mgetz how can i apply whatever the "version information editor" does in a *header-only* library? – umläute Jul 12 '19 at 11:05
  • @umläute you can't directly it creates a `.rc` file that's embedded into the application – Mgetz Jul 12 '19 at 11:13

3 Answers3

4

You can embed assembly pseudo-ops in your header, and it should stay (although it's never used):

asm(".ascii \"Frobnozzel v0.1; © 2019 ACME; GPLv3\"\n\t");

Note that this is GCC/Clang-specific.

An alternative for MSVC would be using #pragma comment or __asm db:

__asm db "Frobnozzel v0.1; © 2019 ACME; GPLv3"
#pragma comment(user, "Frobnozzel v0.1; © 2019 ACME; GPLv3")

Here's an example:

chronos@localhost ~/Downloads $ cat file.c 
#include <stdio.h>

#include "file.h"

int main(void)
{
        puts("The string is never used.");
}
chronos@localhost ~/Downloads $ cat file.h
#ifndef FILE_H
#define FILE_H 1

#if defined(__GNUC__)
    asm(".ascii \"Frobnozzel v0.1; © 2019 ACME; GPLv3\"\n\t");
#elif defined(_MSC_VER)
# if defined(_WIN32)
    __asm db "Frobnozzel v0.1; © 2019 ACME; GPLv3"
# elif defined(_WIN64)
#  pragma comment(user, "Frobnozzel v0.1; © 2019 ACME; GPLv3")
# endif
#endif
chronos@localhost ~/Downloads $ gcc file.c
chronos@localhost ~/Downloads $ grep "Frobnozzel v0.1; © 2019 ACME; GPLv3" a.out
Binary file a.out matches
chronos@localhost ~/Downloads $ 

Replace the gcc command with clang and the result is the same.

For 64-bit Windows, this requires either replacing user with the deprecated exestr or creating a resource file that embeds the string in the executable file. As this is, the string will be removed when linking.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
  • note the MSVC solution won't work on x64 where `__asm` is banned. They would need to do this via linker directive. – Mgetz Jul 11 '19 at 15:36
  • @Mgetz What's x64? – S.S. Anne Jul 11 '19 at 15:37
  • 1
    x86_64 the platform the majority of the world is using for a computer – Mgetz Jul 11 '19 at 15:38
  • @Mgetz Oh. I've never heard of that naming for it before. I usually hear either x86_64 or amd64. – S.S. Anne Jul 11 '19 at 15:39
  • 1
    Regardless there are [linker directives for this](https://learn.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2019) – Mgetz Jul 11 '19 at 15:39
  • Noted, btw: if they really want it to show up they need to use a resource file. The linker directives generally only affect the object file. – Mgetz Jul 11 '19 at 15:43
  • that looks promising; however, i haven't found any reference that using `pragma comment(user` requires a resource file yet. – umläute Jul 12 '19 at 08:54
  • @umläute That wasn't what I was implying. Hopefully the latest edit should clear things up. – S.S. Anne Jul 12 '19 at 15:06
2

TL;DR;

You might not be able to force a value into the compilation unit, but you can force a symbol by defining a global variable in the header. i.e.: long using_my_library_version_1_2_3;

The symbol will be accessible externally in the final binary file and could be tested against (though, like any solution, it could be circumvented, not to mention that the header itself could be altered).

EDIT: To clarify (due to comment), don't use a static variable.

By using a global variable it will default to extern and will not be optimized away (in case other objects loading the binary use the identifier).

Caveats and Example:

As mentioned in the comments, the global variable's identifier (name) is the string in this approach.

However, when compiling executables (and kernels), identifiers could be stripped from the final binary when compiling with (-s). This is often performed by embedded system developers and by people that enjoy making debugging a living hell (even more than it is anyway).

A quick example:

// main.c
int this_is_example_version_0_0_1; /* variable name will show in the file */

int main(void) {
  /* placed anywhere to avoid the "not used" warning: */
  (void)this_is_example_version_0_0_1;
  return 0;
}

// extra.c
int this_is_example_version_0_0_1; /* repeat line to your heart's content  */
int this_is_example_version_0_0_1; /* (i.e., if header has no include guard) */

Compile:

 $ cc -xc -o a -Wall -O2 main.c extra.c

List all identifiers/names (will show global):

 nm ./a | grep "this_is_example_version"

Test for string in binary file using:

$ grep -F "this_is_example_version" ./a

Details:

Funny facts about C that make this solution possible...:

  1. C defines extern as the default for both function and variable declarations in the global scope (6.2.2, subsection 5).

  2. According to section 6.2.2 ("Linkages of identifiers"), "each declaration of a particular identifier with external linkage denotes the same object or function."

    This means that duplicate declarations in the global scope will be collated to a single declaration.

  3. Variable declarations and variable definitions look the same when the variable is placed in the global scope and all of it's bits are set to zero.

    This is because global variables are initialized to zero by default. Hence, compilers can't tell if int foo; is a definition (int foo = 0;) or a declaration (extern int foo;).

Because of this "identity" and these rules, compilers convert ambiguous global variable declarations/definitions into "weak" declarations, to be resolved by the linker.

This means that if you define a global variable without the extern keyword and without a value, the ambiguous declaration/definition will force the compiler to emit a weak symbol that will be exposed in the final binary.

This symbol could be used to identify the fact that the header was used somewhere in the program.

Myst
  • 18,516
  • 2
  • 45
  • 67
  • Ehm. Your TL;DR is basically what op has already said, but without volatile, which makes it worse. – klutt Jul 10 '19 at 17:28
  • @klutt how does `volatile` make it better? Besides, OP used a static variable which would be optimized away. A non-static `extern` (by default) variable, as proposed in my solution, won’t be optimized away. – Myst Jul 10 '19 at 17:32
  • I don't know how in particular, but according to OP, volatile worked for gcc but not clang. – klutt Jul 10 '19 at 17:35
  • Besides, you're using a `long` variable. How does that ensure that the string OP mentions ends up in the binary? – klutt Jul 10 '19 at 17:37
  • @klutt - in this approach, the variable name **is** the string. Instead of testing for a string object, the binary output is tested for the identifier. – Myst Jul 10 '19 at 17:39
  • How would you do this test? – klutt Jul 10 '19 at 17:44
  • @klutt - how do you test what? Test that it works by trying it. Testing for the names is usually platform specific. On Linux you use the `nm` command line. Otherwise you could programmatically test for it. Some binary outputs will have the identifier as a string somewhere in the file. – Myst Jul 10 '19 at 18:43
  • Wow. I actually had no idea that symbol names were preserved in the binary. I learned something new. I read your post, but I did not believe it. Thank you. – klutt Jul 10 '19 at 18:51
  • I think your answer would benefit if you described how you do the test and give a sample output. Also, I think it could be worth mentioning that you can get around this by just using `-s` when compiling. – klutt Jul 10 '19 at 18:55
  • @klutt - true. I’ll edit the answer when I get back to my laptop. Note that the `-s` option is only valid for executables (libraries need to keep symbols to expose their functionality). – Myst Jul 10 '19 at 19:11
0

I don't know if there is any standard way of doing it, but depending on how your library works i might have a reasonable solution. Many libraries have init functions that are usually called only once or at least very rarely in the code. srand() is one example.

You could require an init function for your library to work, and without specifying exactly it purpose, you could just say that the main function needs to have the line initlib(); before any library functions are used. Here is an example:

l.h:

// Macro disguised as a function
#define initlib() init("Frobnozzel v0.1; © 2019 ACME; GPLv");  

void init(const char *);
void libfunc(void);

l.c:

#include "l.h"
#include <string.h>
#include <stdlib.h>

int initialized = 0;

void init(const char *str) {
    if(strcmp(str, "Frobnozzel v0.1; © 2019 ACME; GPLv3") == 0)
        initialized = 1;
}

void libfunc(void) {
    if(!initialized)
        exit(EXIT_FAILURE);
    /* Do stuff */
}

Note: I know that you asked for header only, but the principle is the same. And afterall, converting a .h,.c pair to just a .h file is the simplest task in the world.

If you use the library function libfunc before you have used the initialization macro initlib, the program will just exit. The same thing will happen if the copyright string is changed in the header file.

Sure, it's not hard to get around this if you want, but it works.

For testing, I used this code:

int main()
{
    initlib();
    libfunc();
    printf("Hello, World!\n");
}

I tried this by compiling l.c into a shared library. Then I compiled a simple main program with both clang and gcc using -O3. The binaries worked as they should and they contained the copyright string.

klutt
  • 30,332
  • 17
  • 55
  • 95