2

GIVEN:

A set of jobs to be run in parallel: { app0, app1, app2, .... }

QUESTION:

How is it possible to initiate the tool 'GNU parallel' to run all jobs in parallel, whereby some specific jobs prevented from running concurrently?

EXAMPLE:

If appX and appY rely on the same resources, how can one specify that appX may run in parallel with app0, app1, ... but never with appY?

EXAMPLE 2:

appX and appY may run in parallel, but neither of them shall be running concurrently with appZ.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
Frank-Rene Schäfer
  • 3,182
  • 27
  • 51
  • 2
    If the rules are not too complicated, remove `appX` and `appY` from the list and replace with `appZ = { appX ; appY; }` – Mark Setchell Apr 30 '19 at 10:53
  • Not a solution for GNU parallel, but somewhat related and interesting to read: [Bash complex pipeline dependencies](https://stackoverflow.com/q/48834884/6770384) – Socowi Apr 30 '19 at 11:07
  • 1
    If Mark Setchell's solution is not an option because you want non-deterministic behavior allowing for `appX; ...; appY` as well as `appY; ...; appX` then you can use locking mechanisms in `appX` and `appY`, see [ensure only one instance of a shell script is running at a time](https://stackoverflow.com/q/185451/6770384). – Socowi Apr 30 '19 at 11:19
  • @MarkSetchell: I like the elegance of your solution. Could you make this an answer, so I could mark it as response? However, your solution seems a little intrusive with respect to the time sequencing. GNU parallel could not choose to run `appX` wait some time and then run `appY`. – Frank-Rene Schäfer Apr 30 '19 at 11:25
  • Let's rather wait till Ole Tange, the author of GNU Parallel, logs in. He always has great ideas for applying `parallel` to problems. – Mark Setchell Apr 30 '19 at 11:28
  • If the processes are not dependent on each other, need to be run sequentially however in any order, you should just implement a lock between them. I can blindly tell, they need to share some resource. The access to this resource should be protected by a lock. – KamilCuk Apr 30 '19 at 12:31

1 Answers1

2

It is not 100% clear to me what you want. Maybe replace appX and appY with:

sem --id myidXY --fg appX
sem --id myidXY --fg appY

Which can be done like this:

... | parallel eval '{= s/(app(X|Y))/sem --id appXY --fg $1/ =}'

This should make sure only a single appX or appY is running; but let plenty of appZs running.

{= =} is interpreted as Perl code.

s/(app(X|Y))/sem --id appXY --fg $1/ replaces appX or appY with sem --id appXY --fg followed by either appX or appY depending in what was matched. If nothing is matched then the value is unchanged.

(echo appX; echo appX; echo appX; 
 echo appY; echo appX; echo appV;
 echo appX; echo appZ) |
  parallel eval '{= s/(app(X|Y))/sem --id appXY $1/ =}'

If that is not what you mean, please edit the question.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Yes, the semaphore seems to be the solution. Could you, please, elaborate on the '{= ... =}' construct? How would the complete command line look, if there were, for example three more apps, A, B, and C? – Frank-Rene Schäfer Apr 30 '19 at 13:49
  • 1
    Nice solution! Quick question though, if appX is running, and it goes to start appY that will block and thereby hog a job-slot until appX finishes - won't it? – Mark Setchell Apr 30 '19 at 15:19
  • Yes, there is a risk that all but one jobslots is waiting for the one. – Ole Tange Apr 30 '19 at 23:04