3

Just a simple question but I couldn’t found the answer anywhere. When putting all object files in archive, how to instruct clang++ to only take required objects file for linking in order to avoid undifined symbols errors because of symbols not required in the archive ?

ZachB
  • 13,051
  • 4
  • 61
  • 89
user2284570
  • 2,891
  • 3
  • 26
  • 74
  • see also : https://groups.google.com/forum/#!topic/pdfium/7N3WYn1zz3o – user2284570 Jan 07 '18 at 00:31
  • Your belief that an archive is linked like a single object file is false. By default the linker extracts from an archive just the object files that define unresolved references already accrued by the linkage and links *them* into the output file, ignoring other members of the archive - just as you want. Your undefined reference errors have a different cause but without seeing your failing linkage command and its output we're in the dark. Better post that information. – Mike Kinghan Jan 07 '18 at 17:01
  • @MikeKinghan : how do you then explain the linker complains against missing V8 symbols whereas I want to run just the image processing parts ? Concerning the command it’s simple ! `afl‑clang‑fast++ ‑L. ‑llvmfuzzer ‑lpdfium pdf_codec_png_fuzzer.cc -lpng16 -o fuzz_png.o` where libpdfium.a contains all compiled source files (excluding`third_party`folder)* of pdfium. Hence the https://groups.google.com/forum/#!topic/pdfium/7N3WYn1zz3o question about which exact object files are required since I don’t know how to perform automatic selection. – user2284570 Jan 07 '18 at 17:52
  • To help someone explain your failing linkage you need to show them, at least, the failing linkage command *and its output*, verbatim. This information should be given in the body of your question, not in comments. – Mike Kinghan Jan 07 '18 at 18:20
  • @MikeKinghan it’s not possible, there’s 10Mb log size of undefined errors. – user2284570 Jan 07 '18 at 18:54
  • In that case you need to provide an [mcve], in your question, producing one or a few representative undefined V8 reference errors. – Mike Kinghan Jan 07 '18 at 19:01
  • @MikeKinghan : It now complains about missing symbols present in the archive which can be downloaded [here](https://filebin.net/sftt4qz6co4eqrkn/libk.a.xz). – user2284570 Jan 08 '18 at 12:49
  • Possible duplicate of [Why does the order in which libraries are linked sometimes cause errors in GCC?](https://stackoverflow.com/questions/45135/why-does-the-order-in-which-libraries-are-linked-sometimes-cause-errors-in-gcc) You are specifying the linker input incorrectly. You **must** specify `.o/.c/.cc` files **first** and `.a` archives after that. – n. m. could be an AI Jan 08 '18 at 12:55
  • @n.m. : in that case I’m getting even more undefined errors. But looks ! it’s complaining against symbols not required for libfuzzer test cases. Hence it’s not a duplicate. My problem is I need clang to only take required objects from the archive. – user2284570 Jan 08 '18 at 13:03
  • **This** question is a duplicate. The question where you specify thee order correctly may not be a duplicate, but you have to ask it first. – n. m. could be an AI Jan 08 '18 at 13:04
  • @n.m. : [done](https://stackoverflow.com/revisions/48132989/6) ! And again, it shouldn’t complains about those symbols. – user2284570 Jan 08 '18 at 13:09
  • Now this one has a very simple answer. clang++ uses the standard linker, and the standard linker already takes only required objects. You don't need to specify anything in particular. But you have already been told that. – n. m. could be an AI Jan 08 '18 at 13:12
  • @n.m. : then how do you explain in complains about such undefined symbols if there’s are needed by the dependency object chain ? (I know some files in the archive requires such symbols but I don’t need them). Otherwise, please retract your close vote. – user2284570 Jan 08 '18 at 13:14
  • "there’s are needed by the dependency object chain " You have to demonstrate that. As a general rule, if a dependency is pulled, then it is needed by a dependency chain (otherwise why would it be pulled?). You may not be aware of one or more links in the chain, 's all. I have already retracted the close vote. – n. m. could be an AI Jan 08 '18 at 13:49

1 Answers1

8

You won't have been able to find the answer you're seeking because what you want to make the linker do is what it does by default. Here's a demonstration. (It's in C rather than C++ merely to spare us the obfuscation of C++ name-mangling).

Three source files:

alice.c

#include <stdio.h>

void alice(void)
{
    puts("alice");
}

bob.c

#include <stdio.h>

void bob(void)
{
    puts("bob");
}

mary.c

#include <stdio.h>

void mary(void)
{
    puts("mary");
}

Compile them and put the object files in an archive:

$ clang -Wall -c alice.c
$ clang -Wall -c bob.c
$ clang -Wall -c mary.c
$ ar rc libabm.a alice.o bob.o mary.o

Here's the member list of the archive:

$ ar -t libabm.a
alice.o
bob.o
mary.o

And here are the symbol tables of those members:

$ nm libabm.a

alice.o:
0000000000000000 T alice
                 U puts

bob.o:
0000000000000000 T bob
                 U puts

mary.o:
0000000000000000 T mary
                 U puts

where T denotes a defined function and U an undefined one. puts is defined in the standard C library, which will be linked by default.

Now here's a program that calls alice externally, and so is dependent on alice.o:

sayalice.c

extern void alice(void);

int main(void)
{
    alice();
    return 0;
}

And here's another program that calls alice and bob externally, thus being dependent on alice.o and bob.o.

sayalice_n_bob.c

extern void alice(void);
extern void bob(void);

int main(void)
{
    alice();
    bob();
    return 0;
}

Compile both those sources as well:

$ clang -Wall -c sayalice.c
$ clang -Wall -c sayalice_n_bob.c

The linker option -trace instructs the linker to report the object files and DSOs that are linked. We'll use it now to link program sayalice using sayalice.o and libabm.a:

$ clang -o sayalice sayalice.o -L. -labm -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice.o
(./libabm.a)alice.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o

We see all the boilerplate C libraries and runtimes are linked. And of the object files that we created, just two are linked:

sayalice.o
(./libabm.a)alice.o

The two members of libabm.a that our program does not depend on:

(./libabm.a)bob.o
(./libabm.a)mary.o

were not linked.

Running the program:

$ ./sayalice
alice

it says "alice".

Then for comparison we'll link program sayalice_n_bob, again with -trace:

$ clang -o sayalice_n_bob sayalice_n_bob.o -L. -labm -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice_n_bob.o
(./libabm.a)alice.o
(./libabm.a)bob.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o

This time, three of our object files were linked:

sayalice_n_bob.o
(./libabm.a)alice.o
(./libabm.a)bob.o

And the only member of libabm.a that the program does not depend on:

(./libabm.a)mary.o

was not linked.

This program runs like:

$ ./sayalice_n_bob
alice
bob

Here's the global symbol table of the program:

$ nm -g sayalice_n_bob
0000000000400520 T alice
0000000000400540 T bob
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
0000000000601028 D __dso_handle
0000000000601030 D _edata
0000000000601038 B _end
00000000004005d4 T _fini
                 w __gmon_start__
00000000004003d0 T _init
00000000004005e0 R _IO_stdin_used
00000000004005d0 T __libc_csu_fini
0000000000400560 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
00000000004004f0 T main
                 U puts@@GLIBC_2.2.5
0000000000400410 T _start
0000000000601030 D __TMC_END__

with alice and bob, but not mary.

So as you see, the linker's default behaviour is the behaviour you are asking how to get. To stop the linker from extracting only archive members that are referenced in the linkage and instead to link all archive members, you have to tell it expressly to do so, by placing the archive within the scope of a --whole-archive option in the linkage commandline:

$ clang -o sayalice_n_bob sayalice_n_bob.o -L. -Wl,--whole-archive -labm -Wl,--no-whole-archive -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice_n_bob.o
(./libabm.a)alice.o
(./libabm.a)bob.o
(./libabm.a)mary.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o

There you see that all the archive members are linked:

(./libabm.a)alice.o
(./libabm.a)bob.o
(./libabm.a)mary.o

And the program now defines all of alice, bob and mary:

$ nm -g sayalice_n_bob
0000000000400520 T alice
0000000000400540 T bob
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
0000000000601028 D __dso_handle
0000000000601030 D _edata
0000000000601038 B _end
00000000004005f4 T _fini
                 w __gmon_start__
00000000004003d0 T _init
0000000000400600 R _IO_stdin_used
00000000004005f0 T __libc_csu_fini
0000000000400580 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
00000000004004f0 T main
0000000000400560 T mary
                 U puts@@GLIBC_2.2.5
0000000000400410 T _start
0000000000601030 D __TMC_END__

although it never calls mary.

And a step back

You've asked this question because you believe that if you can link from an archive only those object files that define symbols already referenced in the linkage then the linkage cannot fail with undefined references to symbols that the program never uses. But that isn't true, and here is a demonstration that it isn't.

Another source file:

alice2.c

#include <stdio.h>

extern void david(void);

void alice(void)
{
    puts("alice");
}

void dave(void)
{
    david();
}

Compile that:

$ clang -Wall -c alice2.c

Replace alice.o with alice2.o in libabm.a:

$ ar d libabm.a alice.o
$ ar r libabm.a alice2.o

Then try to link program sayalice as before:

$ clang -o sayalice sayalice.o -L. -labm -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice.o
(./libabm.a)alice2.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
./libabm.a(alice2.o): In function `dave':
alice2.c:(.text+0x25): undefined reference to `david'
/usr/bin/ld: link errors found, deleting executable `sayalice'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

This time, the only archive member that gets linked is:

(./libabm.a)alice2.o

because only alice is called in sayalice.o. Nevertheless the linkage fails with an undefined reference to function david, which the program never calls. david is called only in the definition of function dave, and dave is never called.

Although dave is never called, its definition is linked because it lies in an object file, alice2.o, that is linked to provide a definition of function alice - which is called. And with the definition of dave in the linkage, the call to david becomes an unresolved reference for which the linkage by default must find a definition, or fail. So it fails.

You see then that the failure of a linkage through undefined reference to a symbol that the program never uses is consistent with the fact that the linker does not link unreferenced object files from an archive.

How to survive undefined references to symbols you don't use

If you face this sort of linkage failure, you can avoid it by directing the linker to tolerate undefined references. You can direct it simply to ignore all undefined references, like:

$ clang -o sayalice sayalice.o -L. -labm -Wl,--unresolved-symbols=ignore-all
$ ./sayalice
alice

Or more prudently, you can direct it to just to give warnings, rather than fail, for undefined references, like:

$ clang -o sayalice sayalice.o -L. -labm -Wl,--warn-unresolved-symbols
./libabm.a(alice2.o): In function `dave':
alice2.c:(.text+0x25): warning: undefined reference to `david'
$ ./sayalice
alice

This way, you can check in the diagnostics that the only undefined symbols are the ones you are expecting.

Mike Kinghan
  • 55,740
  • 12
  • 153
  • 182