6

I just read this excellent article: http://neugierig.org/software/chromium/notes/2011/08/static-initializers.html and then I tried: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html

What it says about finding initializers does not work for me though. The .ctors section is not available, but I could find .init_array (see also Can't find .dtors and .ctors in binary). But how do I interpret the output? I mean, summing up the size of the pages can also be handled by the size command and its .bss column - or am I missing something?

Furthermore, nm does not report any *_GLOBAL__I_* symbols, only *_GLOBAL__N_* functions, and - more interesting - _GLOBAL__sub_I_somefile.cpp entries. The latter probably indicates files with global initialization. But can I somehow get a list of constructors that are being run? Ideally, a tool would give me a list of

Foo::Foo in file1.cpp:12
Bar::Bar in file2.cpp:45
...

(assuming I have debug symbols available). Is there such a tool? If not, how could one write it? Does the .init_array section contain pointers to code which could be translated via some DWARF magic to the above?

Community
  • 1
  • 1
milianw
  • 5,164
  • 2
  • 37
  • 41
  • 1
    I found `nm -SlC --size-sort | grep -F ' b '` to be quite helpful already. It found lots of global `string`s, `map`s and such abominations. – Trass3r Sep 13 '22 at 22:08

2 Answers2

3

As you already observed, the implementation details of contructors/initialization functions are highly compiler (version) dependent. While I am not aware of a tool for this, what current GCC/clang versions do is simple enough to let a small script do the job: .init_array is just a list of entry points. objdump -s can be used to load the list, and nm to lookup the symbol names. Here's a Python script that does that. It should work for any binary that was generated by the said compilers:

#!/usr/bin/env python
import os
import sys

# Load .init_array section
objdump_output = os.popen("objdump -s '%s' -j .init_array" % (sys.argv[1].replace("'", r"\'"),)).read()
is_64bit = "x86-64" in objdump_output
init_array = objdump_output[objdump_output.find("Contents of section .init_array:") + 33:]
initializers = []
for line in init_array.split("\n"):
    parts = line.split()
    if not parts:
        continue
    parts.pop(0)  # Remove offset
    parts.pop(-1) # Remove ascii representation

    if is_64bit:
        # 64bit pointers are 8 bytes long
        parts = [ "".join(parts[i:i+2]) for i in range(0, len(parts), 2) ]

    # Fix endianess
    parts = [ "".join(reversed([ x[i:i+2] for i in range(0, len(x), 2) ])) for x in parts ]

    initializers += parts

# Load disassembly for c++ constructors
dis_output = os.popen("objdump -d '%s' | c++filt" % (sys.argv[1].replace("'", r"\'"), )).read()
def find_associated_constructor(disassembly, symbol):
    # Find associated __static_initialization function
    loc = disassembly.find("<%s>" % symbol)
    if loc < 0:
        return False
    loc = disassembly.find(" <", loc)
    if loc < 0:
        return False
    symbol = disassembly[loc+2:disassembly.find("\n", loc)][:-1]
    if symbol[:23] != "__static_initialization":
        return False
    address = disassembly[disassembly.rfind(" ", 0, loc)+1:loc]
    loc = disassembly.find("%s <%s>" % (address, symbol))
    if loc < 0:
        return False
    # Find all callq's in that function
    end_of_function = disassembly.find("\n\n", loc)
    symbols = []
    while loc < end_of_function:
        loc = disassembly.find("callq", loc)
        if loc < 0 or loc > end_of_function:
            break
        loc = disassembly.find("<", loc)
        symbols.append(disassembly[loc+1:disassembly.find("\n", loc)][:-1])
    return symbols

# Load symbol names, if available
nm_output = os.popen("nm '%s'" % (sys.argv[1].replace("'", r"\'"), )).read()
nm_symbols = {}
for line in nm_output.split("\n"):
    parts = line.split()
    if not parts:
        continue
    nm_symbols[parts[0]] = parts[-1]

# Output a list of initializers
print("Initializers:")
for initializer in initializers:
    symbol = nm_symbols[initializer] if initializer in nm_symbols else "???"
    constructor = find_associated_constructor(dis_output, symbol)
    if constructor:
        for function in constructor:
            print("%s %s -> %s" % (initializer, symbol, function))
    else:
        print("%s %s" % (initializer, symbol))

C++ static initializers are not called directly, but through two generated functions, _GLOBAL__sub_I_.. and __static_initialization... The script uses the disassembly of those functions to get the name of the actual constructor. You'll need the c++filt tool to unmangle the names, or remove the call from the script to see the raw symbol name.

Shared libraries can have their own initializer lists, which would not be displayed by this script. The situation is slightly more complicated there: For non-static initializers, the .init_array gets an all-zero entry that is overwritten with the final address of the initializer when loading the library. So this script would output an address with all zeros.

Phillip
  • 13,448
  • 29
  • 41
  • Thanks! But where is the advantage of doing this compared to `nm -a binaryOrSharedLibrary | grep GLOBAL__`? It's far simpler and generates the same output for me. The symbol is not so hard to get. Rather, I'd like to know the file/line to find what I need to edit. – milianw Jan 26 '15 at 10:15
  • Also, in both cases, I only get `_GLOBAL__sub_I_filename.cpp` instead of "proper" symbols for ctors that are run in this file. – milianw Jan 26 '15 at 10:25
  • Symbols can, and often are in non-debug builds, stripped. This removes the `GLOBAL__` symbols as well. The `.init_array`, on the other hand, is usually present. The second reason why I'd prefer to use `.init_array` is that it is more general: C initializer functions (functions having the `__attribute__((constructor))`) also use this mechanism, but there is no separate `GLOBAL__` symbol generated for them. As for the symbol name, there's a generated intermediate function. Search for `__static_initialization_and_destruction_` in `objdump -d`. – Phillip Jan 26 '15 at 10:39
  • Great, the tip of looking at the disassembled output of `__static_initialization_and_destruction_` really helps. With that, I could write a script that maps the `_GLOBAL__sub_I_filename.cpp` to identifiers I get from the `callq` instructions in the `__static_initialization_and_destruction_`! Of course, will only work for a non-stripped full-debug no-inline build of the shared library. But good enough for my use-case! – milianw Jan 26 '15 at 10:52
  • I've updated the script to also follow the generated c++ code to the actual constructor. – Phillip Jan 26 '15 at 10:54
  • You are quick :) But the code only shows at most one ctor, right? Often, there are more. Take this code example: https://paste.kde.org/pn4pbkl5h Your tool only finds `std::ios_base::Init::Init()@plt`, but not `std::allocator::allocator()` nor `std::vector >::vector(std::initializer_list, std::allocator const&)`. Could you extend it? Then I'll accept your answer :) – milianw Jan 26 '15 at 11:06
  • Indeed. Apparently all constructors are collected in a single function. I've updated the script to output all `callq`'s in the function. – Phillip Jan 26 '15 at 11:52
  • Great - thanks! Now I only need to adapt it to work with multiple files linked together into a shared library (currently, you seem to use the very first `__static_initialization_and_destruction? always, i.e. do symbol-name lookup instead of using the address of that functions `callq` in the `_GLOBAL__sub_I_file.cpp` function. Anyhow, this shows thats its doable and I'll accept your answer. If you could extend it to work with the .so - even better :) Otherwise I take that as an exercise for me. – milianw Jan 26 '15 at 11:57
  • Interesting. I was quite sure that a different name would be used for the second object. I've changed the script to include the address in the search. – Phillip Jan 26 '15 at 12:29
  • It works like a charm: https://paste.kde.org/pdojyvaqg - many thanks! I accepted your answer and will award you the bounty when I can do so (2h left). Thanks! – milianw Jan 26 '15 at 13:09
  • Note that (a) `__static_initialization...` may very well get inlined into the `_GLOBAL` function and (b) clang for example uses different names for the individual ctor wrappers. – Trass3r Sep 13 '22 at 22:52
1

There are multiple things executed when loading an ELF object, not just .init_array. To get an overview, I suggest looking at the sources of libc's loader, especially _dl_init() and call_init().

Thomas McGuire
  • 5,308
  • 26
  • 45