I am noticing duplicate symbols present in .so file
Like this?
$ cat foo.c
int foo(void)
{
return 42;
}
Compile:
$ gcc -Wall -fPIC -c foo.c
Check symbols in the object file for foo
:
$ readelf -s foo.o | grep foo
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
8: 0000000000000000 11 FUNC GLOBAL DEFAULT 1 foo
One hit.
Make a shared library:
$ gcc -Wall -shared -o libfoo.so foo.o
Check symbols in the shared library for foo
:
$ readelf -s libfoo.so | grep foo
5: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
29: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
44: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
Now two hits.
Nothing is wrong here. See some more of the picture:
$ readelf -s foo.o | egrep '(foo|Symbol table|Ndx)'
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
8: 0000000000000000 11 FUNC GLOBAL DEFAULT 1 foo
An object file has one symbol table, its static symbol table .symtab
,
that is used by the linker for link-time symbol resolution. But:
$ readelf -s libfoo.so | egrep '(foo|Symbol table|Ndx)'
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
5: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
Symbol table '.symtab' contains 48 entries:
Num: Value Size Type Bind Vis Ndx Name
29: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
44: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
a shared library has two symbol tables: a static symbol table .symtab
, like
an object file, plus a dynamic symbol table, .dynsym
, used by the loader for run-time symbol resolution.
When you link object files into a shared library, the linker by default transcribes the
GLOBAL
symbols from their .symtab
s into the .symtab
and the .dynsym
of the shared
library, except for those symbols that have HIDDEN
visibility in the object files
(which they get from being defined with the attribute of hidden visibility
at compilation).
Any GLOBAL
symbols with HIDDEN
visibility in the object files are transcribed as LOCAL
symbols
with DEFAULT
visibility into the .symtab
of the shared library and are not transcribed
into the .dynsym
of the shared library at all. So when the shared library is linked with
anything else, neither the linker nor the loader can see the global symbols that were HIDDEN
at compilation.
But apart from hidden symbols, of which there are often none, the same global symbols
will appear in the .symtab
and the .dynsym
tables of a shared library. Each defined symbol
that appears in both tables addresses the same definition.
Later, OP comments
I took the symbol table by running objdump -T command, which should ideally list symbols present only in dynamic symbol table.
This steers us to a different explanation, because objdump -T
does indeed report only
the dynamic symbol table (like readelf --dyn-syms
).
Notice that the symbol reported twice:
...
00aab4d0 w DF .text 0000003c Base boost::asio::error::detail::misc_category::~misc_category()
...
00aab50c w DF .text 0000003c Base boost::asio::error::detail::misc_category::~misc_category()
...
is classified w
in column 2 (as are all the other symbols in your snippet). What objdump
means by that is
that the symbol is weak.
Let's repoduce the observation:
foo.hpp
#pragma once
#include <iostream>
struct foo
{
explicit foo(int i)
: _i{i}
{
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
~foo()
{
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
int _i = 0;
};
bar.cpp
#include "foo.hpp"
foo bar()
{
return foo(2);
}
gum.cpp
#include "foo.hpp"
foo gum()
{
return foo(1);
}
Compile and make a shared library:
$ g++ -Wall -Wextra -c -fPIC bar.cpp gum.cpp
$ g++ -shared -o libbargum.so bar.o gum.o
See what dynamic symbols objdump
reports from struct foo
:
$ objdump -CT libbargum.so | grep 'foo::'
00000000000009bc w DF .text 0000000000000046 Base foo::foo(int)
00000000000009bc w DF .text 0000000000000046 Base foo::foo(int)
Duplicate weak exports of the constructor foo::foo(int)
. Just like what you
noticed.
Hang on a tick though. foo::foo(int)
is a C++ method signature, but not
actually a symbol that the linker can recognise. Let's do that again, this time
without demangling:
$ objdump -T libbargum.so | grep 'foo'
00000000000009bc w DF .text 0000000000000046 Base _ZN3fooC1Ei
00000000000009bc w DF .text 0000000000000046 Base _ZN3fooC2Ei
Now we see the symbols the linker sees, and the duplication is no longer to be seen:
_ZN3fooC1Ei
!= _ZN3fooC2Ei
, although both symbols have the same address and
$ c++filt _ZN3fooC1Ei
foo::foo(int)
$ c++filt _ZN3fooC2Ei
foo::foo(int)
they both demangle to the same thing, foo::foo(int)
. There are in fact 5
distinct symbols - _ZN3fooC
NEi
, for 1 <= N <= 5 - that demangle to foo::foo(int)
.
(And g++
actually uses _ZN3fooC1Ei
, _ZN3fooC2Ei
and _ZN3fooC5Ei
in the object
files bar.o
and gum.o
).
So in reality, there are no duplicated symbols in the dynamic symbol table: the
sneaky many-to-one nature of the name-demangling mapping just makes it look that way.
But why?
I'm afraid the answer to that is too long and complicated for here.
Executive Summary
The GCC C++ compiler employs the two weak symbols that
demangle identically to refer to a global inline class-method in different ways, as part of
its stock formula for enabling the successful linkaqe of global inline class-methods in Position Independent Code.
This is a non-neglible problem for any compiler, and the GCC formula for it is not the only possible one. Clang has a different
solution, that does involve the use of synonymous but distinct symbols and so doesn't
give rise to the illusory "duplication" of symbols that you've seen.