GNU Parallel: how to prevent specific jobs from being processed in parallel

Question

GIVEN:

A set of jobs to be run in parallel: { app0, app1, app2, .... }

QUESTION:

How is it possible to initiate the tool 'GNU parallel' to run all jobs in parallel, whereby some specific jobs prevented from running concurrently?

EXAMPLE:

If appX and appY rely on the same resources, how can one specify that appX may run in parallel with app0, app1, ... but never with appY?

EXAMPLE 2:

appX and appY may run in parallel, but neither of them shall be running concurrently with appZ.

If the rules are not too complicated, remove `appX` and `appY` from the list and replace with `appZ = { appX ; appY; }` — Mark Setchell, Apr 30 '19 at 10:53
Not a solution for GNU parallel, but somewhat related and interesting to read: [Bash complex pipeline dependencies](https://stackoverflow.com/q/48834884/6770384) — Socowi, Apr 30 '19 at 11:07
If Mark Setchell's solution is not an option because you want non-deterministic behavior allowing for `appX; ...; appY` as well as `appY; ...; appX` then you can use locking mechanisms in `appX` and `appY`, see [ensure only one instance of a shell script is running at a time](https://stackoverflow.com/q/185451/6770384). — Socowi, Apr 30 '19 at 11:19
@MarkSetchell: I like the elegance of your solution. Could you make this an answer, so I could mark it as response? However, your solution seems a little intrusive with respect to the time sequencing. GNU parallel could not choose to run `appX` wait some time and then run `appY`. — Frank-Rene Schäfer, Apr 30 '19 at 11:25
Let's rather wait till Ole Tange, the author of GNU Parallel, logs in. He always has great ideas for applying `parallel` to problems. — Mark Setchell, Apr 30 '19 at 11:28
If the processes are not dependent on each other, need to be run sequentially however in any order, you should just implement a lock between them. I can blindly tell, they need to share some resource. The access to this resource should be protected by a lock. — KamilCuk, Apr 30 '19 at 12:31

score 2 · Accepted Answer · edited Apr 30 '19 at 14:56

2

It is not 100% clear to me what you want. Maybe replace appX and appY with:

sem --id myidXY --fg appX
sem --id myidXY --fg appY

Which can be done like this:

... | parallel eval '{= s/(app(X|Y))/sem --id appXY --fg $1/ =}'

This should make sure only a single appX or appY is running; but let plenty of appZs running.

{= =} is interpreted as Perl code.

s/(app(X|Y))/sem --id appXY --fg $1/ replaces appX or appY with sem --id appXY --fg followed by either appX or appY depending in what was matched. If nothing is matched then the value is unchanged.

(echo appX; echo appX; echo appX; 
 echo appY; echo appX; echo appV;
 echo appX; echo appZ) |
  parallel eval '{= s/(app(X|Y))/sem --id appXY $1/ =}'

If that is not what you mean, please edit the question.

edited Apr 30 '19 at 14:56

Mark Setchell

191,897
31
273
432

answered Apr 30 '19 at 13:11

Ole Tange

31,768
5
86
104

Yes, the semaphore seems to be the solution. Could you, please, elaborate on the '{= ... =}' construct? How would the complete command line look, if there were, for example three more apps, A, B, and C? – Frank-Rene Schäfer Apr 30 '19 at 13:49
1

Nice solution! Quick question though, if appX is running, and it goes to start appY that will block and thereby hog a job-slot until appX finishes - won't it? – Mark Setchell Apr 30 '19 at 15:19
Yes, there is a risk that all but one jobslots is waiting for the one. – Ole Tange Apr 30 '19 at 23:04

GNU Parallel: how to prevent specific jobs from being processed in parallel

1 Answers1