2

I'm trying to diagnose a customer's problem. Our software crashes randomly with SIGBUS (this signal itself is quite surprise for me). As always we cannot reproduce the problem locally. We've got several core dumps, but all of them are useless. During loading symbols gdb(7.3.1) says:

warning: Could not load shared library symbols for ˙˙lď˙˙î˙˙kî˙˙ßî˙˙^ď˙˙öí˙˙]î˙˙Ńî˙˙Pď˙˙čí˙˙Oî˙˙Ăî˙˙Bď˙˙Úí˙˙Aî˙˙ľî˙˙4ď˙˙Ěí˙˙3î˙˙˙˙&ď˙˙ží˙˙%î˙˙î˙˙ď˙˙°í˙˙î˙˙î˙˙
ď˙˙ clock cycles.

and it stops further symbols loading (the rubbish here is what I get from gdb). Callstack is useless despite we provided non stripped binaries. We get something like this:

#0  0x059c712f in ?? ()
#1  0x0446f70c in ?? () from /home/build/patches/bogdans/06.rtm/build/bin/Linux/libabc.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

or this:

#0  0x00bdc430 in __kernel_vsyscall ()
#1  0x00abf7c4 in ?? ()
#2  0x00000000 in ?? ()

Why these all core dumps are so useless? Is this system problem (if yes, then how can I handle it?)? Or is this because the crash was so serious etc. (can I do something with it?)? I would appreciate any idea here, thanks!

Bogdan
  • 984
  • 8
  • 16
  • 1
    Are you debugging the dump on a different machine? – Paolo M Oct 15 '15 at 13:20
  • @PaoloM Yes. The customer sent us the dumps. – Bogdan Oct 15 '15 at 13:24
  • I only once saw a crash with this signal and it was a non aligned memory access. In my case this was fixed by changing build options that right now I'm not able to remember nor find (this was like 3 years ago ..) .. – rkachach Oct 15 '15 at 13:29
  • @redobot `-fno-strict-aliasing`, all our binaries have it. – Bogdan Oct 15 '15 at 13:32
  • Most likely you are troubleshooting the core on a different configuration. You do not have to use the same physical machine, but the configuration should be identical for gdb do do anything meaningful. – SergeyA Oct 15 '15 at 13:34
  • The reason is most likely unaliagned access, but you need a working core to figure out where. Any chance of actually asking customer to run gdb and collect stacks? This is the single most important thing in this core. – SergeyA Oct 15 '15 at 13:36
  • @SergeyA What do you mean by configuration? Please, explain it a bit more. Yes, we asked about gdb callstacks, but unfortunately they just send dumps. – Bogdan Oct 15 '15 at 13:36
  • Ideally, you need to have exactly the same everything :) At a bare minimum, same OS (version, patch, etc), same third-party libraries (version, location). – SergeyA Oct 15 '15 at 13:40
  • @SergeyA It can be done, I will try it. From the other side, what elements could be the most probable problem? And why are dumps so sensitive? I would say that this is only kind of standardized database which should be able to read anywhere. – Bogdan Oct 15 '15 at 13:43
  • Coredump is nothing more than a memory dump. It has addressess in it. In order to read core file in a meaningful way, gdb needs to load the executable and read all the shared libraries this executable had. If the version of the shared library is different, the addressess become invalid and gdb is confused. As for elements - like I said, it is misaligned access. Something aligned as, say, char is accessed as int - and ints should be aligned on a boundary of 4 bytes. But without the stack you won't know where to look for it. – SergeyA Oct 15 '15 at 13:50

0 Answers0