4

I'm running a script.sh in a loop. The script contains a parallel wget command. I'm getting the following error:

Signal SIGCHLD received, but no signal handler set.

The loop looks like this:

for i in {1..5}; do /script.sh; done

And the line that is causing the error looks like this (omitting options and settings):

cat file.txt | parallel -j15 wget

Research:

I'm not an expert with GNU Parallel, but the script seems to work fine most of the time except when I get the error above. While looking up SIGCHLD, I learned that running parallel can create "zombie processes" and sometimes, we need to "reap" these processes. Also, I found that you can kill processes because sometimes they can take up all the available connections.

Trying To Understand:

However, I don't know what is causing the issue in the first place. Is it my parallels? Am I not "reaping" processes? Should I be killing processes explicitly? Is it because I am running a parallel script in a loop?

My question:

How can I solve the SIGCHLD error?

If you have any experience with this, your insight is greatly appreciated.

DomainsFeatured
  • 1,426
  • 1
  • 21
  • 39
  • What version of parallel are you using? – ccarton Sep 28 '16 at 19:10
  • GNU parallel 20160822 (the latest one I believe) – DomainsFeatured Sep 28 '16 at 19:32
  • I think this might be a bug in parallel. I'm looking through the code and the author is deleting the sigchld handler at one point. Maybe in some environments that has the same effect as ignoring the signal but the perl documentation says that you ignore the signal by setting the handler to "IGNORE". It is silent on what happens if you delete the handler. If you can, try reverting to version 20150222. – ccarton Sep 28 '16 at 19:56
  • Interesting. Will give that a try in a little while. If that works, you are my hero of the week! – DomainsFeatured Sep 28 '16 at 19:58
  • Hey @ccarton, thanks for catching this. It's working with the older version. Please offer an answer so I can choose as correct and upvote the question. Thanks again :-) – DomainsFeatured Sep 30 '16 at 05:17

2 Answers2

1

I think this might be a bug in parallel. There is a point in the code where the author is deleting the sigchld handler, presumably as a way of ignoring the signal. The perl documentation is silent on the effect that would have, suggesting to me that the result will be platform or implementation dependent and unreliable. The proper way to ignore a signal is to set the handler to "INGORE". I suggest trying version 20150222, an older version which does have this questionable code.

ccarton
  • 3,556
  • 16
  • 17
0

(This is just a comment, but too long for comment).

So far no one have been able to reproduce the bug reliably. See if you can make an MCVE https://stackoverflow.com/help/mcve. If the IGNORE suggestion solves the issue, then the fix should be to change line 4361 in parallel from:

    delete $SIG{CHLD};

into:

    $SIG{CHLD} = 'IGNORE';

Let us know if that helped, so it can be put into next version if it works.

Community
  • 1
  • 1
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Hey @OleTange, I feel pretty special to have the author helping me out. Thank you so much. I actually reverted back to Parallel-20150222 yesterday before this post and have run the script a few times with no errors. I will continue to test it out to be certain. Would this be enough information to compare the "IGNORE" setting? Otherwise, if it helps you, I could reinstall the latest version and try to test it out. Although, I'm not quite sure where to put `$SIG{CHLD} = 'IGNORE';` if it goes in the terminal, at the top of the script or somewhere in the package. Thanks again :-) – DomainsFeatured Sep 29 '16 at 20:10
  • Change line 4361 in the script parallel. – Ole Tange Sep 30 '16 at 02:09
  • And please see if you can make an MCVE. That will still be very helpful. – Ole Tange Sep 30 '16 at 02:10
  • Hey @OleTange, it's working with the older 20150222 version if that has the ignore option. If not, I will do another reinstall this weekend and try to create an example that duplicates the issue. Seems like we are on different time zones but I'll do my best to test it out this weekend so you can get feedback. In the meantime, I'm really happy the old version works. – DomainsFeatured Sep 30 '16 at 05:15
  • I've made a few attempts myself to re-create this with no luck. However, I don't know the code well enough to craft a test that will be certain to trigger this code path. – ccarton Sep 30 '16 at 09:14
  • Line 4361 above is run at least every second. – Ole Tange Sep 30 '16 at 19:46
  • $SIG{CHLD}="DEFAULT"; worked for me. $SIG{CHLD} = 'IGNORE'; did not. See https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html – pbot Feb 21 '19 at 18:29