I have a multi-threaded program, which, when run through strace
, shows this:
read(10, "lorem ipsum...", 100) = 100
read(10, 0x2ae9ebcb5000, 8191) = ? ERESTARTSYS (To be restarted)
--- SIGTERM ... ---
Whenever the ERESTARTSYS occurs, the program ends up hanging on the read. When the ERESTARTSYS does not occur, the program exits successfully and I get:
read(10, "lorem ipsum...", 100) = 100
read(10, "", 8191) = 0
...
exit_group(0)
Looking at the strace
manpage (for an strace that isn't my version) and SO questions like this and this, it seems that the read is being interrupted by some signal. I could be misunderstanding the doc, but I don't see any signal other than SIGTERM, which I'm assuming is from me exiting the program.
I've determined that the two reads are from a std::getline invocation, which reads twice when the delimiter isn't found (it isn't being found because the delimiter is incorrect and nowhere in the string, but I can't fix it because it's in a library I have no control over). Adding the delimiter to the string seems to prevent the second read, which causes the code to run without a problem.
I'm also positive that there's some race condition in the code because when I turn off the parallelism, this error does not occur. One of my wild guesses is that the read is being interrupted during a thread context switch, however that's just a wild guess and nothing in the strace indicates that this is true. Additionally, I'm not sure why it wouldn't simply restart after being switched back in. I can't find the race condition, though, and I was hoping that understanding the strace and the ERESTARTSYS could help me figure out where the bug is.
If it helps, I'm running on RHEL5 and compiling using gcc 4.7.2.