1

I have a linux service (c++, with lots of loadable modules, basically .so files picked up at runtime) which from time to time crashes ... I would like to get behind this crash and investigate it, however at the moment I have no clue how to proceed. So, I'd like to ask you the following:

  1. If a linux service crashes where is the "core" file created? I have set ulimit -c 102400, this should be enough, however I cannot find the core files anywhere :(.
  2. Are there any linux logs that track services? The services' own log obviously is not telling me that I'm going to crash right now...
  3. Might be that one of the modules is crashing ... however I cannot tell which one. I cannot even tell which modules are loaded. Do you know how to show in linux which modules a service is using?
  4. Any other hints you might have in debugging a linux service?

Thanks f-

Ferenc Deak
  • 34,348
  • 17
  • 99
  • 167
  • have a look at strace. It will be able to tell you which modules are being loaded. Simplest way is to integrate into the init script and redirect to a log file – sehe Oct 18 '11 at 12:48

3 Answers3

2

0) Get a staging environment which mimics production as close as possible. Reproduce problem there.

1) You can attach to a running process using gdb -a (need a debug build of course)

2) Make sure the ulimit is what you think it is (output ulimit to a file from the shell script which runs your service right before starting it). Usually you need to set ulimit in /etc/profile file; set it ulimit -c 0 for unlimited

3) Find the core file using find / -name \*core\* -print or similar

4) I think gdb will give you the list of loaded shared objects (.so) when you attach to the process.

5) Add more logging to your service

Good luck!

MK.
  • 33,605
  • 18
  • 74
  • 111
  • thanks :) I don't think I can attach the debugger to it. The service automatically restarts if it goes down, and I will have to make some fights to allow me to debug on that machine (ie. keep the service down). I'll updated the ulimit. :) – Ferenc Deak Oct 18 '11 at 12:46
  • you don't actually need a debug build. also `gdb --pid` is nice – sehe Oct 18 '11 at 12:47
  • @sehe well, yeah, you can attach w/o debug build and even get a stack trace if it crashes. – MK. Oct 18 '11 at 12:56
  • @MK: or... see what modules are loaded (`info shared`, IIRC) – sehe Oct 18 '11 at 12:57
2

Under Linux, processes which switch user ID, get their core files disabled for security reasons. This is because they often do things like reading privileged files (think /etc/shadow) and a core file could contain sensitive information.

To enable core dumping on processes which have switched user ID, you can use prctl with PR_SET_DUMPABLE.

Core files are normally dumped in the current working directory - if that is not writable by the current user, then it will fail. Ensure that the process's current working directory is writable.

MarkR
  • 62,604
  • 14
  • 116
  • 151
0

Your first order of business should be getting a core file. See if this answer applies.

Second, you should run your server under Valgrind, and fix any errors it finds.

Reproducing the crash when running under GDB (as MK suggested) is possible, but somewhat unlilkely: bugs tend to hide when you are looking for them, and the debugger may affect timing (especially if your server is multi-threaded).

Community
  • 1
  • 1
Employed Russian
  • 199,314
  • 34
  • 295
  • 362