1

I'm trying to track down an issue where Valgrind can't resolve symbols of functions that pass through certain libraries. I get output like this:

==83597== 920 bytes in 1 blocks are possibly lost in loss record 750 of 864
==83597==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==83597==    by 0x548EF93: myproject_malloc (mysourcefile.c:48)
==83597==    by 0x4F13FD5: ??? (in /path/to/project/library-version.so)
==83597==    by 0x54542FF: ??? (in /path/to/project/library-version.so)
==83597==    by 0x4F536CA: ??? (in /path/to/project/library-version.so)
==83597==    by 0x54542FF: ??? (in /path/to/project/library-version.so)

One of the functions inside library-version.so is do_init(). library-version.so is loaded via LD_PRELOAD. I've found that when I run my program under gdb, if I try to put a break point at do_init as soon as I start the program, it complains that it can't find the symbol, but if I put a breakpoint at main and wait until it hits that, then it works.

For example:

(gdb) break do_init
Function "do_init" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) break main
Breakpoint 1 at 0x400b50: file runner.c, line 13.
(gdb) run
... a bunch of output from the stuff in LD_PRELOAD ...
Breakpoint 1, main (argc=1, argv=0x7fffffffe028) at myprogram.c:13
13      return do_some_stuff();
(gdb) break do_init
Breakpoint 2 at 0x7ffff7658de0: file my/library/initializer.c, line 25.

So this leads me to two questions:

  1. It seem that do_init is getting pulled in by the dynamic linker. How can I find out what step of the initialization process that happens at? There are many libraries used in this project that define functions with __attribute__((constructor)) and they get glued together with a linker script.

  2. Why doesn't Valgrind see the symbols loaded by the dynamic linker like GDB does? I'm 99% sure that nothing is being dlclose'd, and I thought that anything under LD_PRELOAD would remain visible to Valgrind no matter what.

Community
  • 1
  • 1
Patrick Collins
  • 10,306
  • 5
  • 30
  • 69
  • @iharob Of which? The shared library or the binary? – Patrick Collins Dec 22 '15 at 01:51
  • How are you building `library-version.so`? Does it have debug symbols? – Iharob Al Asimi Dec 22 '15 at 01:51
  • All of them, is there a Makefile? – Iharob Al Asimi Dec 22 '15 at 01:52
  • @iharob Yes, 100% confirmed, `objdump -S library-version.so` shows code inline with the disassembly. – Patrick Collins Dec 22 '15 at 01:52
  • @iharob Building this part of the project passes through ~10 makefiles and I'm not sure if I'm allowed to post any verbatim code online. – Patrick Collins Dec 22 '15 at 01:54
  • Not that much code. You can post it, but it wont help. Then can you check that all `CFLAGS` have `-O0 -g3`? – Iharob Al Asimi Dec 22 '15 at 01:55
  • @iharob I'm pretty sure. I can see that `do_init` has an entry in `objdump -g shared-library.so` and that the entry contains the correct info --- that's enough to confirm that at least that symbol should come through, right? – Patrick Collins Dec 22 '15 at 02:02
  • I don't know! then how can you exlpain that valgrind can't see it? It normally happens when there are no debug symbols obviously. Make sure that the libraries being loaded are the ones with the symbols then. – Iharob Al Asimi Dec 22 '15 at 02:04
  • @iharob My suspicion is that those symbols are being loaded too early or too late for valgrind to see them, hence the question about why GDB can't see it until after it hits `main` (since its debug info is in `library-version.so` and `library-version.so` is in `LD_PRELOAD`) --- I'm wondering if the same mechanism can explain both. – Patrick Collins Dec 22 '15 at 02:29
  • That's the normal behavior for *gdb* when the functions are from shared libraries AFAIK. But valgrind MUST be able to detect the problems with line number and everything if the binaries were properly compiled. – Iharob Al Asimi Dec 22 '15 at 02:31
  • @iharob okay, well that's partially an answer to my question about why GDB can't see it until I hit main. After GDB does hit main, I can see all of the usual debug info, so I'm not convinced that it could have been complied without it --- how else could gdb see it? --- but if you want to post that as an answer I'll accept if nothing more thorough is posted. – Patrick Collins Dec 22 '15 at 02:53
  • Let's wait to see if someone with more experience can answer it better. – Iharob Al Asimi Dec 22 '15 at 02:56
  • Gdb can't see shared library symbols, until these libraries are loaded, which they are once program is started. You can break even on _start, and libraries will be loaded when you hit it. What does "info shared" show? – dbrank0 Dec 22 '15 at 12:55
  • @dbrank0 "info shared" says "Yes" under "Syms read" for every library. There's an asterisk next to the "Yes" on the .so for my unit testing framework, and it explains that that library is missing debugging info, but that's not the one I'm having an issue with. – Patrick Collins Dec 22 '15 at 17:43
  • I'm not too familiar with valgrind, but I know that for gdb the source code must be visible (as in the same directory) as where `gdb` is executing AND every file must be both compiled AND linked with the `-g` parametger (if using gcc for compiler/linker, then even better to use `-ggdb as that maximizes the info for gdb`) Suggest posting all the CFLAGS macros, the LFLAGS macros, compile rules and the link rules With out those details we are only guessing. – user3629249 Dec 25 '15 at 07:23
  • Try `(gdb) catch load library-version.so` to find out where in the initialization process it's pulled in. – Mark Plotnick Dec 27 '15 at 00:29

1 Answers1

0

It turned out that because of a whole bunch of strange config options that happened to be present at the same time, the program section header of library-version.so that contained the .text section was flagged with rwx permissions rather than r-x permissions. Valgrind believes that .text sections cannot have rwx permissions on amd64 machines, so it ignores them when it tries to load debug symbols.

I believe this is a bug in Valgrind because .text sections with rwx permissions are perfectly valid according to the relevant standards; it turns out a report has already been filed here, which I've expanded upon.

Patrick Collins
  • 10,306
  • 5
  • 30
  • 69