0

I have a bunch of shell scripts that need to be executed. Currently they are running on a single machine serially which takes a long time.

There are no dependencies among the scripts. I want to execute them in parallel on a cluster. Within each node of a cluster, I'd like to execute multiple scripts in parallel, leveraging multiple cores.

What is the best way to do this in python. I'd like to write a python program that runs on one machine and spawns tasks on other machines on the cluster.

If python is not the best way to go about it, I'd like to know other solutions as well

nish
  • 6,952
  • 18
  • 74
  • 128
  • @MarkSetchell: Will I be able to track the status or get sysout from the jobs offloaded to other nodes using this? – nish Nov 05 '15 at 19:34
  • [this](http://stackoverflow.com/questions/636561/how-can-i-run-an-external-command-asynchronously-from-python) and [this](http://stackoverflow.com/questions/3777301/how-to-call-a-shell-script-from-python-code) will probably help you out a bunch – R Nar Nov 05 '15 at 19:41
  • If you want to use local machines, take a look at the [multiprocessing](https://docs.python.org/2/library/multiprocessing.html) library. If you want to do it with remote systems, you'd like to use [RPyC](http://rpyc.readthedocs.org/en/latest/). If you are looking for an Orchestration solution in python maybe [Fabric](http://www.fabfile.org/) or [Ansible](http://docs.ansible.com/ansible/) but they rely on a SSH server on the target system. – memoselyk Nov 05 '15 at 19:56
  • if you just want to execute a script on several nodes you can install `c3` tools and then use `cexec` in shell (you need to have password-less ssh access to all nodes of course) – Azad Nov 05 '15 at 20:08
  • There are several solutions for this out there. All of them are out of the scope of SO. – Klaus D. Nov 05 '15 at 20:20
  • What sort of cluster do you have available? It may already have a job control / resource allocation system, something like Grid Engine or HTCondor. – Hugh Bothwell Nov 05 '15 at 20:37

1 Answers1

0

I am not really sure what kind of progress information you want. Try these and explain how they do not fit your purpose.

parallel --results outputdir --progress -S server1,server2 ::: script1.sh script2.sh script*.sh
parallel --results outputdir --joblog - -S server1,server2 ::: script1.sh script2.sh script*.sh
Ole Tange
  • 31,768
  • 5
  • 86
  • 104