7

I have a Solaris process, which is a C++ application that is loaded by ld with a few .so libraries. This application has a function that gets a return address in the calling function and then tries to determine the name of the said calling function.

If I use dladdr(3) for that, it does not always put what I expect to see in Dl_info::dli_sname. It looks like it returns a name of a function that is not nearest below or at the pointer value. If I take the pointer value and look at the output of nm, I can match the value to the exact function I expect it to be.

I am wondering if there is a way to retrieve a symbol map for a process and let it search for the function name without using dladdr(3). I am especially interested to get a symbol map not just for the executable itself but also for all .so libraries that it has loaded.

I'm running on Solaris10/SPARC and I'm using gcc 4.2.x.

Thank you!

evolvah
  • 625
  • 4
  • 15
  • Is it scanning the debug information? I remember XTank did something like that but it was a long time ago when I looked at the code. – Martin York Aug 11 '11 at 18:29
  • Thank you for the comment, Martin. What I am talking about is more of an introspective of a process into its own symbol table. I know, I can use use `nm`, `objdump`, and even `pstack` to look at this information but I am talking about retrieving this information by the process itself. – evolvah Aug 11 '11 at 20:08

2 Answers2

4

I have tried a simple test using dladdr() on Solaris 10/SPARC (but caveats: GCC 3.4, straight C), and this works fine for me:

#include <dlfcn.h>
#include <stdio.h>

void print_name(char *name, void *addr);
void print_name_by_dladdr(void *addr);

int main(int argc, const char *argv[])
{
    print_name("main", (void *)&main);
    print_name("print_name", (void *)&print_name);
    print_name("printf", (void *)&printf);
    return 0;
}

void print_name(char *name, void *addr)
{
    (void)printf("Getting name of function %s() at 0x%x\n", name, addr);
    print_name_by_dladdr(addr);
}

void print_name_by_dladdr(void *addr)
{
    Dl_info dli;
    if(!dladdr(addr, &dli)) {
        perror("dladdr()");
        exit(1);
    }
    (void)printf("  %s\n", dli.dli_sname);
}

Output:

Getting name of function main() at 0x10714
  main
Getting name of function print_name() at 0x10778
  print_name
Getting name of function printf() at 0x209b8
  _PROCEDURE_LINKAGE_TABLE_

This also works correctly if I write (for example)

    print_name("main", (void *)&main + 4);

You say you can resolve correctly against the output of nm so possibilities seem limited... are you certain that the return address is being derived or passed correctly to your resolver function? I guess you are using the GCC builtins for this? I have tested __builtin_return_address(0) and this also works fine for me. If you are using the GCC builtins, did you call __builtin_extract_return_address() (see above page for details, mentions SPARC explicitly)? Can you post your code?

Can you stretch slightly to "process re-reading it's own binary/shared object files"? If so then libelf may be a way forwards. This is exactly what some of those utilities you mention are using, eg nm: http://cr.opensolaris.org/~devnull/6515400/usr/src/cmd/sgs/nm/common/nm.c.html

This introductory article from sun.com might be of use (warning: article is 10 years old).

This isn't as nice as doing native introspection and it's odd that dladdr(3C) doesn't work :(

Alternative intermediate: have you tried the RTLD_DL_SYMENT flag to dladdr1(3C) (and then perhaps borrow from nm.c as above on the returned ELF sym)?

Martin Carpenter
  • 5,893
  • 1
  • 28
  • 32
  • Thank you again for the detailed answer! I am using `__builtin_return_address(1)` to extract the return address in the caller. However, if I try to adjust the returned address via `__builtin_extract_return_address()`, I get a compilation error. GCC complains about `__builtin_extract_return_address` not being declared in the scope. At the same time, it happily compiles the sample app with `__builtin_return_address(1)`. I am confused now, how come it knows about one built-in and does not know about the other. – evolvah Aug 12 '11 at 15:32
  • And I think you are completely right, I'm having this problem because of not using `__builtin_extract_return_address`. But I don't understand how GCC could be not aware of its own built-in... – evolvah Aug 12 '11 at 15:39
  • According to comment in http://www.newae.com/6lowpan/fip/a00105.html `__builtin_extract_return_address` needs at least gcc 4.5. Time to upgrade... – Martin Carpenter Aug 12 '11 at 15:57
  • Thanks again! I will need to explore more alternatives. Compiler upgrade in my environment is not a very viable option. I'm trying to re-invent a light-weight alternative to Dmalloc with a number of customizations to suit the type of projects I'm dealing with the most. – evolvah Aug 12 '11 at 16:23
3

a little late, but maybe still a help: in elf-object files, there are generally 2 symbol tables: .symtab and .dynsym nm by default reads .symtab, use nm -D to read the .dynsym table. dladdr (as well as the dynamic loader) do use the .dynsym table. the .symtab table is more complete. you can force all symbols being in the .dynsym table as well using the -rdynamic linker flag. however, this slows down linking significantly (e.g. in my current project by aprox. 200ms). (nb: the above said referes to linux, but the symbol handling works principly the same on sunos. command line options might differ)

frank
  • 31
  • 1
  • Thank you for the answer. I hope I will get time to play with the ELF-file format in the next few weeks. – evolvah Sep 23 '11 at 16:32