I have a monitor script will check a specified process, if it crash, the script will relaunch it without waiting for the core dump writing complete. Does this incur bad things? Will it affect the core dump file or the relaunched process?
-
1Each process is separated so in general if one crashed other won't care really. Core dump does not matter either. Nor the fact that you relaunch the crashed code (yet, you should fix the bug). Finally this question is off-topic here – Marcin Orlowski Mar 31 '18 at 03:03
-
Please explain what is the actual process and program that dumps core. – Basile Starynkevitch Apr 01 '18 at 11:40
2 Answers
Yes, you can. A process is a different thing than a program. As you can have several instances of the ls
command in unix running in parallel, there's nothing to impede you to relaunch the same program (but a different, new process) again while it is saving the core file. The only difference from an normal process writing a file is that the process writing a core
just does it in kernel mode. Nothing else.
Core dump is executed by the process killed executing in kernel mode, as a previous to die task. For the purposes of process state, the process in state exiting and nothing can affect it until the core dump is finished (it can only be interrupted by a write error in the dump file, or perhaps this is an interruptible state)
The only problem you can have, is that the next instance you launch, as it tries to write the same core file name, will have to wait for it to end (i think the inode is only locked on a per write basis only, not for the whole file) and you get a bunch of processes dying and writing the same core file. That's not the case if the core happens to a new, different file (the file is unlinked before creating it) but that depends on implementation. Probably an exploit should be a DOS attack to begin generating cores at a high pace, to make the writing of core files to queue a lot of processes in non interrupting state. But I think this is difficult to achieve... most probably only you'll get high load by many processes writing different core files just to be erased next (as a consequence of the unlink system call made by then next core generating task).

- 10,974
- 1
- 16
- 31
A core(5) dump is very bad, and you should fix its root cause. It is generally the result of some unexpected and unhandled signal(7) (perhaps some memory corruption giving a SIGSEGV, etc...; read also about undefined behavior and be very scared of UB).
if it crash, the script will relaunch it without waiting for the core dump writing complete.
So your approach is flawed, except as a temporary measure. BTW, in many cases, the virtual address space of the faulty process is small enough for the core
to be dumped in a small fraction of a second. In some cases, the dumping of the core
might take many minutes (think of a big HPC process dealing with hundreds of gigabytes of data on a supercomputer).
It is rumored that, in the previous century, some huge core
files took half an hour to be dumped on Cray supercomputers.
You really should fix your program to avoid dumping core.
We don't know at all what is your buggy program which dumps core. But if it has some persistent state (e.g. in some database or some file) which you care about, your approach is very wrong: the core
dump might perhaps happen in the code which produces that state, and then, if you restart the same program, it could reuse that faulty state.
Does this incur bad things?
Yes in general. Perhaps not in your specific case (but we don't know what your program is doing).
So, you'll better understand why is that core
happening. In general, you would compile your program with all warnings and debug info (so gcc -Wall -Wextra -g
with GCC) and use gdb
to analyze post-mortem the core dump (see this).
You really should not write programs which dump core
(even if that happens to all of us; but it is a strong bug that should be fixed ASAP). And you should not accept core
dumps as an acceptable behavior of your programs.
The core
dumps are here to help the developer to fix some serious problem. Read also about Unix philosophy. It is socially unacceptable to consider as "normal" a core dump, which is definitely an abnormal program behavior.
(there are several ways to avoid core
dumps; but that makes a different question; and you need to explain what kind of programs you are writing and monitoring, and why and how it is dumping core.)

- 223,805
- 18
- 296
- 547