How to determine why valgrind/callgrind kills process

Question

I have written a multithreaded stress test for a database infrastructure I am working with, and I am trying to profile it using callgrind. The program executes perfectly outside of valgrind and provides expected results.

However, when running it under valgrind --tool=callgrind the program executes for a short amount of time, and then stops, with valgrind reporting Killed as it's last output to stdout.

Is there a way for me to determine why valgrind killed my task?

After following phd's advice: it does gets killed with valgrind --tool=none, however, I'm not entirely sure how to analyse the messages I've been given, there seem to be a lot of sigvgkill signals in my threads. The first instance of this is here:

--13713:1:syswrap- run_a_thread_NORETURN(tid=104): pre-thread_wrapper
 --> [pre-success] Success(0x0:0x365c)--13713:1:syswrap- thread_wrapper(tid=104): entry
 SYSCALL[13713,104](311) sys_set_robust_list ( 0x4f213be0, 12 )[sync] --> Success(0x0:0x0)
SYSCALL[13713,104](240) sys_futex ( 0xbeaf348, 128, 2, 0x0, 0x0 ) --> [async] ...
--13713-- async signal handler: signal=13, tid=32, si_code=0
--13713-- interrupted_syscall: tid=32, ip=0x380b197c, restart=False, sres.isErr=True, sres.val=32
--13713--   completed, but uncommitted: committing
--13713:1:gdbsrv   VG core calling VG_(gdbserver_report_signal) vki_nr 13 SIGPIPE gdb_nr 13 SIGPIPE tid 32
--13713:1:gdbsrv   not connected => pass
--13713-- delivering signal 13 (SIGPIPE):0 to thread 32
--13713-- delivering 13 (code 0) to default handler; action: terminate
==13713==

Are you sure that it was originated from the valgrind? Or are you running out of memory and the kernel is killing the process? — pah, Jul 26 '16 at 12:49
@threadp Does callgrind add a significant memory overhead? I am not allocating much memory in my application and it has never ran out of memory before when running ordinarily in the kernel? How would I determine this? — Thomas Russell, Jul 26 '16 at 12:52
Check your `dmesg` output after the kill occurs. This is unlikely to be the issue, but it is a possibility. — pah, Jul 26 '16 at 12:57
There's nothing in the `dmesg` output relating to my process or valgrind — Thomas Russell, Jul 26 '16 at 12:59

score 3 · Answer 1 · edited Jul 27 '16 at 22:21

To my knowledge, valgrind does not kill a program with such very little verbosity as 'killed'. Such things looks more like a kill from another process.

Nonetheless, you can try several things to investigate why your program behaves differently under valgrind rather than natively:

first run it under the valgrind --tool=none. This is the faster tool (doing nothing). You can then see if your program behaves as expected. If not, then run with additional valgrind internal trace, e.g.
```
--tool=none -v -v -v -d -d -d --trace-syscalls=yes --trace-signals=yes
```
The trace might give a clue then about why it aborts/is killed.
run it under --tool=memcheck and --tool=helgrind (and similarly, if crashes, you can run with more tracing).
and then finally, --tool=callgrind + more tracing, if the above did not yet clarify.

Thanks for your advice, I've updated my question accordingly! — Thomas Russell, Jul 27 '16 at 08:48
According to the tracing, it looks like your process gets a SIGPIPE signal 13. Such signal by default will kill your process. — phd, Jul 28 '16 at 20:31

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

This is a bit of an old question - but what's happening is that you are receiving the SIGPIPE (broken pipe - writing to a pipe that has nothing listening on the other end) signal.

Valgrind takes note of it ("hey, I'm seeing a SIGPIPE that's meant for your program"), and continues to deliver it to your program (since it was meant for it, after all).

Since you likely haven't specified what should happen when you receive SIGPIPE, the default action is executed, which is to terminate your program. See Why does SIGPIPE exist? . Remember that programs under Valgrind run much slower, so behaviour ("works under Valgrind and doesn't work otherwise" and vice versa) may differ due to timing.

If you are expecting SIGPIPE during regular use and want to ignore it (so that it doesn't kill your program), do so by calling

#include <signal.h>
// ...
signal(SIGPIPE, SIG_IGN); // ignore broken pipe signal

You might want to do the same for other signals that you may expect and that would otherwise be fatal for your process (SIGHUP, ...).

So to sum up, Valgrind didn't kill your process, but instead has given you a hint as to why your process is dying. There's only few cases where I've seen Valgrind kill my process (which of course were my own fault) - usually it doesn't. Even when you read/write to memory addresses you don't own, Valgrind won't kill your process. It'll complain, sure, but it'll execute the instruction, and what actually kills your process is the SIGSEGV that's coming right after you tried to read/write the memory.

This is what it looks like when Valgrind kills your process:

It happens so rarely, I actually screenshotted it. ;)

I'm amazed this had 0 up-votes. You're right on the part about why you screenshotted it. The rest of your post is also so have a +1. — Pryftan, Mar 12 '20 at 13:56
I assume it's simply because the question was from 2016, and I answered in 2017. Few look at old questions, and that's a pretty low profile one ;) — Aaa, Mar 12 '20 at 18:24
Probably so. But still if I encountered it many others surely do. Who can tell? Ah well, doesn't really matter I guess - I just thought it was worthy of a vote so there you go. — Pryftan, Mar 12 '20 at 22:34

How to determine why valgrind/callgrind kills process

2 Answers2