0

I got some strange glibc-related linker errors for builds with distributed build cache configured on build nodes running different Linux distributions.

Now I somehow suspect build artifacts from those machines with different glibc versions getting mixed up, but I don't know how to investigate this.

How do I find out what Bazel takes into account when building the hash for a certain build artifact?

I know I can explicitly set environment variables which then will affect the hash. But how can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?

And how do I check/compare what's been taken into account?

frans
  • 8,868
  • 11
  • 58
  • 132

1 Answers1

1

This is a complex topic and a multi-facet question. I am going to answer in the following order:

  1. How do I check/compare what's been taken into account?
  2. How to investigate against which glibc a build linked?
  3. How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?

How do I check/compare what's been taken into account?

To answer this, you should look into the the execution look, specifically you can read up on https://bazel.build/remote/cache-remote#compare-logs. The *.json execution log should contain everything you need to know (granted, it might be a bit verbose) and is a little easier to process with shell-magic/your editor.

How to investigate against which glibc a build linked?

From the execution log, you can get all the required hashes to retrieve cached artifacts/binaries from your remote cache. Given these files, you should be able to use standard tools to get to the glibc version (ldd -r -v binary | grep GLIBC).

How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?

This depends on the way you have setup for compilation toolchain. The best case would be a fully hermetic compilation toolchain, where all necessary files are declared using attributes like https://bazel.build/reference/be/c-cpp#cc_toolchain.compiler_files.

But this would also mean to lock-down the compiler sysroot. This should include all libraries you are linking against if you want full hermeticity. If you want to use some system libraries, you need to tell bazel where to find them and to factor in their hash: https://stackoverflow.com/a/43419786/20546409 or https://www.stevenengelhardt.com/2021/09/22/practical-bazel-depending-on-a-system-provided-c-cpp-library/

If you use the auto-detected compiler toolchain, some tricks are used to lock-down the sysroot paths, but expect some non-hermiticity. https://github.com/limdor/bazel-examples/tree/master/linux_toolchain is a nice write-up how to move from the auto-detected toolchain to something more hermetic.

The hack

Of course, you can hack around this. Note, this is inherently a bad idea:

  • create a script that inspects the system, determines everything important like the glibc version, maybe the linux distribution (flavor)
  • creates a string describing this variation and hash-summing it
  • use that as the instance key/name for your remote cache
lummax
  • 333
  • 8
  • Can you elaborate on why "the hack" is a bad idea? Currently to me it looks like while Bazel _could_ find those differences automatically and take them into account, it doesn't. So you could either copy everything that _might_ effect your build and make it part of your toolchain or you create a hash of it and make sure you don't have false cache hits. What might happen to me if I did the latter? – frans Jan 16 '23 at 13:59
  • Yes, it is a bad idea because you introduce a source for cache misses that might be difficult to communicate to your team members. Also it is unclear when you have to inspect your system. Preferably on every build, because a system upgrade could have upgraded the `glibc`. But then you need a wrapper around `bazel` which might confuse some IDE integration or command line completion. – lummax Jan 17 '23 at 07:56
  • Just to make sure I understood your answer correcty: you're suggesting to create a fully hermetic build environment which guarantees cache misses if anything changes. And a sub-optimal way to also cause those cache misses would be to manually create a textual list of properties and using those via `--set_action_env` to bind them to the target hash. Right? – frans Jan 18 '23 at 07:57
  • I was thinking about the "instance name" as described in https://github.com/buchgr/bazel-remote. And a scripts that generates a `--remote_cache` flag that contains a hashed instance name. But yes, action envs should work aswell. – lummax Jan 18 '23 at 08:37