Remove dead code before linking

Question

In a project that consists of several statically linked object files, I am replacing one of them with a separate implementation. I would like to test my code even before I have implemented every symbol which the replaced object file provided, so I am using -Wl,--unresolved-symbols=ignore-all to have the linker not complain about the missing symbols.

But as I test the code, it simply crashes when trying to use one of the undefined symbols. Therefore, I am looking for a way to tell the linker “please remove all unreferenced code before linking, and then tell me if in the code reachable from the entry point, there still are unreferenced symbols”. Is that possible?

The object files are generated from generated LLVM IR (rather than C code), if that makes a difference. — Joachim Breitner, Mar 29 '17 at 20:56
Does passing the IR through opt suffice? Sounds like deadstripping then linking as normal would do the trick — Jon Chesterfield, Mar 30 '17 at 23:11
No, I don’t think `opt` helps, as `opt` looks at one module at a time, right? — Joachim Breitner, Mar 30 '17 at 23:31
I'm thinking of using llvm-link to put the various IR together, then opt (especially -dce, -globaldce) to strip. Convert the IR to object code. The linker then gets a single file and will still warn about whatever symbols were actually accessible. — Jon Chesterfield, Mar 31 '17 at 17:33
Hmm, that might work, but is hard to integrate into the existing build system (which compiles each `.ll` down to `.o` files before eventually linking them). — Joachim Breitner, Mar 31 '17 at 19:42
On the bright side, if you keep the .ll (or .bc) files around, then llvm-link them, you get the option of cross module optimisations as well as dead stripping. But yeah, potentially tough to wire into a build system. — Jon Chesterfield, Apr 01 '17 at 00:10

score 3 · Answer 1 · answered Apr 01 '17 at 00:51

Writing up an answer closely based on the comments as I've found a use for this in my own code base and checked that the method works OK for me.

int do_things(void);

int application_main(void)
{
  return do_things();
}

int test_main(void)
{
  return 42;
}

int main(void)
{
  return test_main();
}

The layout approximately reflects my use case. A given block of IR may have two entry points, one for running unit tests and one for doing whatever the code is used for. The unit tests need a subset of the symbols needed by the entire module. There is an advantage to being able to build the unit test part without building everything else.

Deadstripping is a definite improvement over my previous method of -Wl,--unresolved-symbols=ignore-all

clang demo.c     # undefined reference to `do_things'
clang -O3 demo.c # undefined reference to `do_things'
clang demo.c -c -emit-llvm -o demo.bc # OK

llvm-nm demo.bc 
---------------- T application_main
                 U do_things
---------------- T main
---------------- T test_main
clang demo.bc    # undefined reference to `do_things'

opt -o stripped.bc -internalize -internalize-public-api-list=main -globaldce demo.bc
llvm-nm stripped.bc 
---------------- T main
---------------- t test_main
clang stripped.bc # OK

The list of public symbols can be derived from ir files (in my case at least) so the opt invocation is actually

 opt -internalize -internalize-public-api-list=`llvm-nm -extern-only -defined-only -just-symbol-name some-file.bc` -globaldce -O2

Remove dead code before linking

1 Answers1