1

A shared object, such as glibc, when compiled appropriately, defines many symbols, such as main_arena, that are not normally used by other programs (although they can be seen in objdump and gcc), but are defined, with their addresses, as local symbols:

 $ objdump -t ../.glibc/glibc_2.30_no-tcache/libc.so.6 | grep main_arena
 00000000003b4b60 l     O .data 0000000000000898      main_arena

Yet, when I reference one of these in C (via extern), and attempt to link, the linker can't find it:

$ gcc -g -Og -no-pie -Wl,-rpath ../.glibc/glibc_2.30_no-tcache/ -Wl,--dynamic-linker=../.glibc/glibc_2.30_no-tcache/ld.so.2  s1.c -o s1
/usr/bin/ld: /tmp/ccjKyCNh.o: in function `printf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:112: undefined reference to `main_arena'
/usr/bin/ld: /usr/include/x86_64-linux-gnu/bits/stdio2.h:112: undefined reference to `main_arena'
collect2: error: ld returned 1 exit status

Note: I've updated this question with extensive research:

This is by design:

Nonetheless, for debugging, exploration, and reverse engineering, its sometimes desirable to reference an external local symbol defined in a shared object. All the information is there, as evidenced by gdb's ability to display it; its simply a flag that tells ld to not resolve symbols to it.

Given such, is it possible to tell ld to ignore the local flag, and resolve to the symbol anyway?

For example:

$ objdump -t ../.glibc/glibc_2.30_no-tcache/libc.so.6 | grep -E ' malloc$| main_arena$'
00000000003b4b60 l     O .data  0000000000000898              main_arena
0000000000083500 g     F .text  0000000000000213              malloc

$ man objdump 2>/dev/null | grep -A10 'flag characters'
           The flag characters are divided into 7 groups as follows:

           "l"
           "g"
           "u"
           "!" The symbol is a local (l), global (g), unique global (u), neither global nor local (a space) or both global and
               local (!). ...

I'd like to be able to write code that, for debugging and reverse engineering, references the symbol main_arena regardless. How can I do this?


Update

I've read Employed Russian's excellent posts on related topics, and seen his reference to the XY Problem. With that in mind, let me ask my question X:

For exploratory purposes, I'd like to be able to look at the behavior of things like main_arena, and other malloc internals, as I use malloc and free. I can do this with gdb. But I'd like to do this programaticaly, in C. One way to do this might have been to actually link to these symbols (question Y), but there's no reason to assume that's the best way, the only way, or even a viable way. Given that:

How can I inspect the value of local symbols in a shared library from within a different program, without having to drop to gdb?

SRobertJames
  • 8,210
  • 14
  • 60
  • 107

1 Answers1

3

Given such, is it possible to tell ld to ignore the local flag, and resolve to the symbol anyway?

No.

All the information is there, as evidenced by gdb's ability to display it; its simply a flag that tells ld to not resolve symbols to it.

You are mistaken. While the symbol is present in the static symbol table (in the .symtab section), it is not present in the dynamic symbol table (in the .dynsym section). It is not just a matter of a flag, fundamental parts needed to perform dynamic linking at runtime are missing.

  1. You can confirm this by looking in readelf --dyn-syms .../libc.so.6 | grep main_arena -- the symbol will not be there.
  2. You could binary patch the "flag", changing STB_LOCAL binding of the symbol in .symtab to STB_GLOBAL. After you do that, the symbol will show as g in the objdump output, but the linker will still not be able to use it.

P.S. You should never use objdump to examine ELF binaries -- it's highly deficient for that purpose. Use readelf instead.

Update:

How does GDB find ...

By reading .symtab section.

Is there a way I can tell ld to do something similar?

No. The linker could easily read the .symtab section as well, and can link the binary that imports the main_arena symbol in the same way it imports e.g. stdout.

But such a binary will not run.

At runtime, as soon as the binary is loaded, the loader (ld.so) will need to resolve the reference to main_arena. And since the symbol is not present in the dynamic symbol table (which is the only symbol table ld.so can use), the symbol resolution will fail and ld.so will exit with a fatal error.

This is precisely the same thing as linking a.out against foo.so with int foo defined, and then running that a.out against a different version of foo.so, one without foo in it.

Update 2:

Is that simply a feature that ld lacks (because it's not needed outside of reverse engineering and other nonstandard use cases), or is it inherently not possible?

It's a feature that both ld (the static linker) and ld.so (the dynamic loader) lack.

It's possible to do (GDB can resolve these symbols after all), but a lot of work, for very little gain.

Could one possibly augment ld to use the regular .symtab (I understand it would be slower due to lack of hashes)?

Like I said, you would need to modify both ld and ld.so. The latter is part of GLIBC, and modifying GLIBC has complications. Making any mistakes in the process can easily render your system un-bootable.

And if you are going to modify GLIBC anyway, it would likely be much simpler to expose all the symbols you want (make them non-local). That way you only need to change GLIBC, and can use standard ld and the rest of standard symbol resolution mechanisms.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks. How does gdb find the symbol's address? Is there a way I can tell ld to do something similar? – SRobertJames Nov 25 '21 at 06:38
  • @SRobertJames I've updated the answer. – Employed Russian Nov 25 '21 at 06:48
  • "the symbol is not present in the dynamic symbol table (which is the only symbol table ld.so can use)" - please clarify: Is that simply a feature that ld lacks (because it's not needed outside of reverse engineering and other nonstandard use cases), or is it inherently not possible? Could one possibly augment ld to use the regular `.symtab` (I understand it would be slower due to lack of hashes)? – SRobertJames Nov 25 '21 at 17:41
  • Also, I've read your other posts, and updated the question accordingly. – SRobertJames Nov 25 '21 at 17:41
  • @SRobertJames I've updated the answer. – Employed Russian Nov 25 '21 at 17:56