Here is an implementation of your scenario:
a.c
#include <stdio.h>
void xx(void)
{
puts(__func__);
}
void yy(void)
{
puts(__func__);
}
b.c
#include <stdio.h>
void nn(void)
{
puts(__func__);
}
void mm(void)
{
puts(__func__);
}
c.c
#include <stdio.h>
void qq(void)
{
puts(__func__);
}
void rr(void)
{
puts(__func__);
}
test.c
extern void xx(void);
int main(void)
{
xx();
return 0;
}
Compile all the *.c
files to *.o
files:
$ gcc -Wall -c a.c b.c c.c test.c
Make a static library stat.a
, containing a.o
, b.o
, c.o
:
$ ar rcs stat.a a.o b.o c.o
Link program test
, inputting test.o
and stat.a
:
$ gcc -o test test.o stat.a
Run:
$ ./test
xx
Let's see the symbol tables of the object files in stat.a
:
$ nm stat.a
a.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T xx
0000000000000013 T yy
b.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
0000000000000013 T mm
0000000000000000 T nn
U puts
c.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T qq
0000000000000013 T rr
The definitions (T
) of xx
, yy
are in member stat.a(a.o)
. Definitions of nn
, mm
are in stat.a(b.o)
. Definitions of qq
, rr
are in stat.a(c.o)
.
Let's see which of those symbols are also defined in the symbol table of the program test
:
$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
000000000000065d T yy
xx
, which is called in the program, is defined. yy
, which is not called, is also
defined. nn
, mm
, qq
and rr
, none of which are called, are all absent.
That's what you've observed.
I would like to know why the symbols qq
and rr
do not get exported?
What is a static library, such as stat.a
, and what is its special role in a linkage?
It is an ar
archive that conventionally - but not necessarily - contains nothing
but object files. You can offer such an archive to the linker from which to select the
object files it needs, if any, to carry on the linkage. The linker needs those object
files in the archive that provide definitions for symbols that have been
referenced, but not yet defined, in input files it has already linked. The
linker extracts the needed object files from the archive and inputs them to the
linkage, exactly as if they were individually named input files and the static library
was not mentioned at all.
So what the linker does with an input static library is different from what it does
with an input object file. Any input object file is linked into the output file unconditionally
(whether it is needed or not).
In this light, let's redo the linkage of test
with some diagnostics (-trace)
to show what
files are actually linked:
$ gcc -o test test.o stat.a -Wl,--trace
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
Apart from all the boiler-plate files for a C program linkage that gcc
adds by
default, the only files of ours in the linkage are the two object files:
test.o
(stat.a)a.o
The linkage:
$ gcc -o test test.o stat.a
is exactly the same as the linkage:
$ gcc -o test test.o a.o
Let's think that through.
test.o
was the first linker input. This object file was linked unconditionally into the program.
test.o
contains a reference (specifically, a function call) to xx
but no definition of the function xx
.
- So the linker now needs to find a definition of
xx
to complete the linkage.
- The next linker input is the static library
stat.a
.
- The linker searches
stat.a
for an object file that contains a defintion of xx
.
- It finds
a.o
. It extracts a.o
from the archive and links it into the program.
- There are no other unresolved symbol references in the linkage for which the
linker can find definitions in
stat.a(b.o)
or stat(c.o)
. So neither of those
object files is extracted and linked.
By extracting an linking (just) stat.a(a.o)
the linker has got a definition
of xx
that it needed to resolved the function call in test.o
. But a.o
also contains
the definition of yy
. So that definition is also linked into the program.
nn
, mm
, qq
and rr
are not defined in the program because none of them
are defined in the object files that were linked into the program.
That's the answer to your first question. Your second is:
Is there any method to prevent any other symbols than xx
being loaded?
There are at least two ways.
One is simply to define each of xx
, yy
, nn
, mm
, qq
, rr
in a source
file by itself. Then compile object files xx.o
, yy.o
, nn.o
, mm.o
, qq.o
, rr.o
and archive all of them in stat.a
. Then, if the linker ever needs to find an
object file in stat.a
that defines xx
, it will find xx.o
, extract and link it,
and the definition of xx
alone will be added to linkage.
There's another way that does not require you code just one function in each source
file. This way depends on the fact that an ELF object file, as produced by the
compiler, is composed of various sections and these sections are in fact the
units that the linker distinguishes and merges together into the output file. By
default, there is a standard ELF section for each kind of symbol. The
compiler places all of the function definitions in one code section and
all data definitions in an appropriate data section. The reason that your
linkage of program test
contains the definitions of both xx
and yy
is that
the compiler has placed both of these definitions in the single code section of a.o
,
so the linker can either merge that code section into the program, or not: it can
only link the definitions of xx
and yy
, or neither of them, so it is obliged
to link both, even though only xx
is needed. Let's see the disassembly of the code section of a.o
. By default the
code section is is called .text
:
$ objdump -d a.o
a.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <xx>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <xx+0xb>
b: e8 00 00 00 00 callq 10 <xx+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
0000000000000013 <yy>:
13: 55 push %rbp
14: 48 89 e5 mov %rsp,%rbp
17: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 1e <yy+0xb>
1e: e8 00 00 00 00 callq 23 <yy+0x10>
23: 90 nop
24: 5d pop %rbp
25: c3 retq
There you see the definitions of xx
and yy
, both in the .text
section.
But you can ask the compiler to place the definition of each global symbol
in its own section in the object file. Then the linker can seperate the code
section for any function definition from any other, and you can ask the linker
to throw away any sections that aren't used in the output file. Let's try that.
Compile all the source files again, this time asking for a separate section per symbol:
$ gcc -Wall -ffunction-sections -fdata-sections -c a.c b.c c.c test.c
Now look again at the disassembly of a.o
:
$ objdump -d a.o
a.o: file format elf64-x86-64
Disassembly of section .text.xx:
0000000000000000 <xx>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <xx+0xb>
b: e8 00 00 00 00 callq 10 <xx+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
Disassembly of section .text.yy:
0000000000000000 <yy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <yy+0xb>
b: e8 00 00 00 00 callq 10 <yy+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
Now we've got two code sections in a.o
: .text.xx
, containing only the definition of xx
,
and .text.yy
, containing only the definition of yy
. The linker can merge either of
these sections into a program and not merge the other.
Rebuild stat.a
$ rm stat.a
$ ar rcs stat.a a.o b.o c.o
Relink the program, this time asking the linker to discard unused input sections
(-gc-sections
). We'll also ask it to trace the files it loads (-trace
)
and to print a mapfile for us (-Map=mapfile
):
$ gcc -o test test.o stat.a -Wl,-gc-sections,-trace,-Map=mapfile
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
The -trace
output is exactly the same as before. But check again which of our
symbols are defined in the program:
$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
Only xx
, which is what you want.
The output of the program is the same as before:
$ ./test
xx
Finally look at the mapfile. Near the top you see:
mapfile
...
Discarded input sections
...
...
.text.yy 0x0000000000000000 0x13 stat.a(a.o)
...
...
The linker was able to throw away the redundant code section .text.yy
from
the input file stat.a(a.o)
. That's why the redundant definition of yy
is
no longer in the program.