5

I am new to c++ symbol tables and libraries, wanted to understand the behavior of symbol table. We are having an android application with native support on it. In process of analyzing symbol tables of shared libraries, I am noticing duplicate symbols present in .so file. Please find the sample list of symbol table.

0162502c  w   DO .data  00000004  Base        boost::asio::error::get_addrinfo_category()::instance

00aaa4f4  w   DF .text  0000009c  Base        boost::asio::error::get_misc_category()

01626334  w   DO .bss   00000004  Base        guard variable for boost::asio::error::get_misc_category()::instance

00aab4d0  w   DF .text  0000003c  Base        boost::asio::error::detail::misc_category::~misc_category()

00aab368  w   DF .text  0000003c  Base        boost::asio::error::detail::addrinfo_category::~addrinfo_category()

00aab3a4  w   DF .text  00000034  Base        boost::asio::error::detail::addrinfo_category::name() const

00aab3d8  w   DF .text  000000f8  Base        boost::asio::error::detail::addrinfo_category::message(int) const

00aab50c  w   DF .text  0000003c  Base        boost::asio::error::detail::misc_category::~misc_category()

Here you can notice following symbol "boost::asio::error::detail::misc_category::~misc_category()" appearing twice.

I wanted to understand why are we getting duplicate symbols in symbol table. Also interested to know why my app is running fine when there are duplicate symbols [ which linker should ideally throw duplicate symbols error ] Also would like to know does having duplicate symbols in symbol tables would increase the size of "so" eventually leading to increasing in the size of app

If this happens, how could I ensure that I get only unique entries in symbol table. Note:- we are using clang

Teja
  • 109
  • 4
  • 12

1 Answers1

9

I am noticing duplicate symbols present in .so file

Like this?

$ cat foo.c
int foo(void)
{
    return 42;
}

Compile:

$ gcc -Wall -fPIC -c foo.c

Check symbols in the object file for foo:

$ readelf -s foo.o | grep foo
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo.c
     8: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 foo

One hit.

Make a shared library:

$ gcc -Wall -shared -o libfoo.so foo.o

Check symbols in the shared library for foo:

$ readelf -s libfoo.so | grep foo
     5: 000000000000057a    11 FUNC    GLOBAL DEFAULT    9 foo
    29: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo.c
    44: 000000000000057a    11 FUNC    GLOBAL DEFAULT    9 foo

Now two hits.

Nothing is wrong here. See some more of the picture:

$ readelf -s foo.o | egrep '(foo|Symbol table|Ndx)' 
Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo.c
     8: 0000000000000000    11 FUNC    GLOBAL DEFAULT    1 foo

An object file has one symbol table, its static symbol table .symtab, that is used by the linker for link-time symbol resolution. But:

$ readelf -s libfoo.so | egrep '(foo|Symbol table|Ndx)' 
Symbol table '.dynsym' contains 11 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     5: 000000000000057a    11 FUNC    GLOBAL DEFAULT    9 foo
Symbol table '.symtab' contains 48 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    29: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo.c
    44: 000000000000057a    11 FUNC    GLOBAL DEFAULT    9 foo

a shared library has two symbol tables: a static symbol table .symtab, like an object file, plus a dynamic symbol table, .dynsym, used by the loader for run-time symbol resolution.

When you link object files into a shared library, the linker by default transcribes the GLOBAL symbols from their .symtabs into the .symtab and the .dynsym of the shared library, except for those symbols that have HIDDEN visibility in the object files (which they get from being defined with the attribute of hidden visibility at compilation).

Any GLOBAL symbols with HIDDEN visibility in the object files are transcribed as LOCAL symbols with DEFAULT visibility into the .symtab of the shared library and are not transcribed into the .dynsym of the shared library at all. So when the shared library is linked with anything else, neither the linker nor the loader can see the global symbols that were HIDDEN at compilation.

But apart from hidden symbols, of which there are often none, the same global symbols will appear in the .symtab and the .dynsym tables of a shared library. Each defined symbol that appears in both tables addresses the same definition.

Later, OP comments

I took the symbol table by running objdump -T command, which should ideally list symbols present only in dynamic symbol table.

This steers us to a different explanation, because objdump -T does indeed report only the dynamic symbol table (like readelf --dyn-syms).

Notice that the symbol reported twice:

...
00aab4d0  w   DF .text  0000003c  Base        boost::asio::error::detail::misc_category::~misc_category()
...
00aab50c  w   DF .text  0000003c  Base        boost::asio::error::detail::misc_category::~misc_category()
...

is classified w in column 2 (as are all the other symbols in your snippet). What objdump means by that is that the symbol is weak.

Let's repoduce the observation:

foo.hpp

#pragma once
#include <iostream>

struct foo
{
    explicit foo(int i)
    : _i{i}
    {
        std::cout << __PRETTY_FUNCTION__ << std::endl;
    }
    ~foo()
    {
        std::cout << __PRETTY_FUNCTION__ << std::endl;
    }
    int _i = 0;
};

bar.cpp

#include "foo.hpp"

foo bar()
{
    return foo(2);
}

gum.cpp

#include "foo.hpp"

foo gum()
{
    return foo(1);
}

Compile and make a shared library:

$ g++ -Wall -Wextra -c -fPIC bar.cpp gum.cpp
$ g++ -shared -o libbargum.so bar.o gum.o

See what dynamic symbols objdump reports from struct foo:

$ objdump -CT libbargum.so | grep 'foo::'
00000000000009bc  w   DF .text  0000000000000046  Base        foo::foo(int)
00000000000009bc  w   DF .text  0000000000000046  Base        foo::foo(int)

Duplicate weak exports of the constructor foo::foo(int). Just like what you noticed.

Hang on a tick though. foo::foo(int) is a C++ method signature, but not actually a symbol that the linker can recognise. Let's do that again, this time without demangling:

$ objdump -T libbargum.so | grep 'foo'
00000000000009bc  w   DF .text  0000000000000046  Base        _ZN3fooC1Ei
00000000000009bc  w   DF .text  0000000000000046  Base        _ZN3fooC2Ei

Now we see the symbols the linker sees, and the duplication is no longer to be seen: _ZN3fooC1Ei != _ZN3fooC2Ei, although both symbols have the same address and

$ c++filt _ZN3fooC1Ei
foo::foo(int)
$ c++filt _ZN3fooC2Ei
foo::foo(int)

they both demangle to the same thing, foo::foo(int). There are in fact 5 distinct symbols - _ZN3fooCNEi, for 1 <= N <= 5 - that demangle to foo::foo(int). (And g++ actually uses _ZN3fooC1Ei, _ZN3fooC2Ei and _ZN3fooC5Ei in the object files bar.o and gum.o).

So in reality, there are no duplicated symbols in the dynamic symbol table: the sneaky many-to-one nature of the name-demangling mapping just makes it look that way.

But why?

I'm afraid the answer to that is too long and complicated for here.

Executive Summary

The GCC C++ compiler employs the two weak symbols that demangle identically to refer to a global inline class-method in different ways, as part of its stock formula for enabling the successful linkaqe of global inline class-methods in Position Independent Code. This is a non-neglible problem for any compiler, and the GCC formula for it is not the only possible one. Clang has a different solution, that does involve the use of synonymous but distinct symbols and so doesn't give rise to the illusory "duplication" of symbols that you've seen.

Mike Kinghan
  • 55,740
  • 12
  • 153
  • 182
  • So from above explanation, is it always expected to have duplicate symbols in symbol table ? – Teja Feb 19 '19 at 03:57
  • @Teja As explained, a shared library has *two* symbol tables: the symbols in its dynamic symbol table are a subset of the symbols in its static symbol table. – Mike Kinghan Feb 19 '19 at 07:02
  • Thanks for your patience!! I took the symbol table by running objdump -T command, which should ideally list symbols present only in dynamic symbol table. Despite using "-T" flag, I am noticing duplicate symbols, is this expected ? if yes, why is it expected ? – Teja Feb 19 '19 at 09:05
  • @Teja Updated answer maybe helps – Mike Kinghan Feb 19 '19 at 20:14
  • Thanks for the detailed analysis, I now got the point why I am seeing duplicate symbols [ which is effect of demangling] , so ideally in your example if have one more method named "Test" and returns foo(3), then one more entry would be getting added to symbol table. is that true ? Is this expected even if we use Clang [ Please note we are using clang] . – Teja Feb 20 '19 at 04:08
  • @Teja Sorry but I have to call time on this question now. – Mike Kinghan Feb 20 '19 at 07:32