2

I have a gdb backtrace of a crashed process, but I can't see the specific line in which the crash occurred because the source code was not in that moment. I don't understand some of the information given by the mentioned backtrace.

The backtrace is made of lines like the following one:

<path_to_binary_file>(_Z12someFunction+0x18)[0x804a378]

Notice that _Z12someFunction is the mangled name of int someFunction(double ).

My questions are:

Does the +0x18 indicate the offset, starting at _Z12someFunction address, of the assembly instruction that produced the crash?

If the previous question is affirmative, and taking into account that I am working with a 32-bit architecture, does the +0x18 indicates 0x18 * 4 bytes?

If the above is affirmative, I assume that the address 0x804a378 is the _Z12someFunction plus 0x18, am I right?

EDIT:

The error has ocurred in a production machine (no cores enabled), and it seems to be a timing-dependant bug, so it is not easy to reproduce it. That is because the information I am asking for is important to me in this occasion.

Dan
  • 2,452
  • 20
  • 45
  • 2
    just build binary with debug information – user7860670 Jan 26 '18 at 13:50
  • You are right in everything except that the offset is in bytes and not in 32-bit words. – Johan Jan 26 '18 at 13:53
  • Can we see your compilation commands? I suspect you built it with a too low debug mode. – Benjamin Barrois Jan 26 '18 at 13:59
  • Yes, the compilation was with optimizations and without debug symbols, but I cannot solve my problem just compiling with debug information since I can't reproduce the error easily. – Dan Jan 26 '18 at 14:06
  • But this is not gdb backtrace, it's [glibc backtrace](http://man7.org/linux/man-pages/man3/backtrace.3.html). – ks1322 Jan 26 '18 at 14:06

3 Answers3

4

Most of your assumptions are correct. The +0x18 indeed means offset (in bytes, regardless of architecture) into the executable.

0x804a378 is the actual address in which the error occurred.

With that said, it is important to understand what you can do about it.

First of all, compiling with -g will produce debug symbols. You, rightfully, strip those for your production build, but all is not lost. If you take your original executable (i.e. - before you striped it), you can run: addr2line -e executable

You can then feed into stdin the addresses gdb is giving you (0x804a378), and addr2line will give you the precise file and line to which this address refers.

If you have a core file, you can also load this core file with the unstriped executable, and get full debug info. It would still be somewhat mangled, as you're probably building with optimizations, but some variables should, still, be accessible.

Building with debug symbols and stripping before shipping is the best option. Even if you did not, however, if you build the same sources again with the same build tools on the same environment and using the same build options, you should get the same binary with the same symbols locations. If the bug is really difficult to reproduce, it might be worthwhile to try.

EDITED to add

Two more important tools are c++filt. You feed it a mangled symbol, and produces the C++ path to the actual source symbol. It works as a filter, so you can just copy the backtrace and paste it into c++filt, and it will give you the same backtrace, only more readable.

The second tool is gdb remote debugging. This allows you to run gdb on a machine that has the executable with debug symbols, but run the actual code on the production machine. This allows live debugging in production (including attaching to already running processes).

Shachar Shemesh
  • 8,193
  • 6
  • 25
  • 57
  • I have tried to use `addr2line`, but I can't find neither the address I have nor the function address (which I have extracted with `nm -D`). I mean, addr2line shows: `??:0`. I have tried to substract 5 to the funcion address because I have read that that is the length of the `CALL` instruction. – Dan Jan 26 '18 at 14:27
  • There are certain cases where GDB can detect the line number but addr2line can't. I have never been able to figure out what causes that. – Shachar Shemesh Jan 26 '18 at 14:35
  • I'm assuming you made sure you had debug symbols in the executable you loaded with addr2line, of course. You can try to load this executable in GDB, start the program, and then manually set your EIP to that address and see where gdb says you are. If that doesn't work, you're out of luck. – Shachar Shemesh Jan 26 '18 at 14:37
  • I have done it with gdb and, in fact, I have been able to see the function name by doing `info synbol ` . It was in the `.text` section, I have tried to use `addr2line` specifying `-j .text` but I have obtained the same result. – Dan Jan 26 '18 at 15:41
2

You are confused. What you are seeing is backtrace output from glibc's backtrace function, not gdb's backtrace.

but I can't see the specific line in which the crash occurred because the source code was not in that moment

Now you can load executable in gdb and examine the address 0x804a378 to get line numbers. You can use list *0x804a378 or info symbol 0x804a378. See Convert a libc backtrace to a source line number and How to use addr2line command in linux.

ks1322
  • 33,961
  • 14
  • 109
  • 164
0

Run man gcc, there you should see -g option that gives you possibility to add debug information to the binary object file, so when crash happens and the core is dumped gdb can detect exact lines where and why the crash happened, or you can run the process using gdb or attach to it and see the trace directly without searching for the core file.

  • Yes of course, but the error has ocurred once, in a remote machine without dumps enabled, and it cannot be reproduced easily (seems to be a timing-dependant bug). – Dan Jan 26 '18 at 14:04
  • In that case I would suggest either to find easy way to reproduce or above answer from Shachar I believe covers everything based on what you provided. – Irakli Darbuashvili Jan 26 '18 at 14:13