11

Is it possible to call gnu parallel from within multiple runs of a script that are in-turn spawned by gnu parallel?

I have a python script that runs for 100s of sequential iterations, and somewhere within each iteration, 4 values are being computed in parallel (using gnu parallel). Now I want to spawn multiple such scripts at the same time, again, using gnu parallel. Is this possible? Will gnu parallel take care of good utilization of available cores?

For example, if in the inner loop, out of 4 values, 2 have been completed and 2 are running, so that a single script cannot proceed to the next iteration until all 4 values are computed. Will the two free cores be used for computing results for a different run of the script automatically? How can I specify the total number of cores available? In the inner call to parallel or outer call?

This questions shows it is possible to nest calls to parallel, but I'm not sure if this changes when I'm calling the nested parallel from inside a script.

PS: Thrashing is not a concern, I can use a LOT of cores from a large cluster.

PS2: gnu-parallel is an AWESOME tool... thanks! : )

Community
  • 1
  • 1
Neha Karanjkar
  • 3,390
  • 2
  • 29
  • 48

1 Answers1

14

Yes. GNU Parallel is designed (and tested heavily) to be able to be called from GNU Parallel - either directly or through a script.

If called directly you are likely to need to change -I. What does the second {} mean here (does it refer to the first or the second parallel?):

seq 10 | parallel 'seq {} | parallel echo {}'

Here it is very clear:

seq 10 | parallel 'seq {} | parallel -I // echo //'
seq 10 | parallel 'seq {} | parallel -I // echo {} //'

PS Good to hear you find GNU Parallel awesome. If you like GNU Parallel:

  • Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html)
  • Give a demo at your local user group/team/colleagues
  • Post the intro videos and tutorial on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Request or write a review for your favourite blog or magazine
  • Invite me for your next conference

If you use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --bibtex)

If GNU Parallel saves you money:

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Is there documentation available to support what the order of steps and resource consumption are for nested parallelism via GNU parallel? I worry that some readers may see this and try multiple nested parallel calls without really thinking the process through. – Jon Jul 14 '17 at 17:35
  • GNU Parallel normally eats 16 MB RAM. Of the top of my head only these situations will change that: Multiple input sources that forces generation of all combinations (e.g. `parallel echo :::: file1 file2`) - all combinations will be in RAM. `--bar/--eta` usage of `total_jobs()` which will generate all args in memory (around 400 bytes/job). `--linebuffer` will read a full line per job - so a very long line (say 1 GB) will increase RAM usage. `--sql*` will buffer output of a single job in memory when the job is done. But maybe this should make it into `man parallel_design`? – Ole Tange Jul 15 '17 at 15:57
  • Is there a way to use positional arguments like `{1}` and `{2}` when changing the `replace-str` with `-I`? `parallel -N 2 --dry-run 'echo {2}' ::: 1 2` works but `parallel -N 2 --dry-run -I // 'echo /2/' ::: 1 2` doesn't. – dosentmatter Jul 04 '22 at 21:12
  • I found a workaround `seq 3 | parallel 'r=}; seq {} | parallel -N 2 "echo {2$r"'` is to assign `}` to a variable to expand later so it doesn't get replaced by the first `parallel`. I chose to assign `r=}` instead of `l={`, because using `l={` would require `echo $l2}`, which would try to expand the variable `l2` instead of `l`. You could use parameter expansion`echo ${l}2}` or just temporarily unquote`echo "$l"2$r`, but they are both messy. Parameter expansion is also confusing because it uses curly braces, but parallel happens to not interpret the curlies when a letter is in between. – dosentmatter Jul 04 '22 at 21:44
  • This method can be extended to 3 levels of nesting but it gets confusing. In the third `parallel`, you have to escape the nested double quotes, `\"` and you have to escape the `\$` so that it gets expanded in the third `parallel` and not the second. `parallel -N 2 'r=}; echo {2}; parallel -N 2 "r=}; echo {2$r; parallel -N 2 \"echo {2\$r\" ::: 3 c" ::: 2 b' ::: 1 a` – dosentmatter Jul 04 '22 at 22:10
  • 1
    @dosentmatter Personally, I would never do that. Instead I would define a shell function and call that. I think it is easier to read and for others to maintain. – Ole Tange Jul 06 '22 at 07:16
  • @OleTange, agreed. It is a mess. I was just noting a one-liner workaround. I'd only use it for quick throw-away one-liners when I'm too lazy to create scripts. I haven't ever had a chance to use it so far. I've only needed positional arguments in one level of nesting so I just used `-I` for the other levels. – dosentmatter Jul 06 '22 at 11:16
  • This does NOT work at all, when the inner `parallel` is called from inside a script (which is what OP asked). The inner and outer `parallel` invocations don't know anything about each other, so if I start both of them with say `-j 20`, it will happily launch 400 concurrent invocations of the inner `parallel`s command. – marc.guenther Oct 28 '22 at 15:08
  • @marc.guenther That is the goal, yes. – Ole Tange Oct 29 '22 at 20:11
  • Sorry, I was referring to your answer, not this discussion above. "GNU Parallel is designed (and tested heavily) to be able to be called from GNU Parallel - either directly or through a script." Apparently that is not the case. As I wrote, the inner parallel invocations do not know anything about the outer invocations or even each other, and so will completely trash the CPU. – marc.guenther Oct 31 '22 at 09:51
  • @marc.guenther Sorry if it was unclear: Being the author I can assure you that "GNU Parallel is designed (and tested heavily) to be able to be called from GNU Parallel - either directly or through a script." but these instances do not communicate. Hope that made it clear. – Ole Tange Oct 31 '22 at 09:55