I have python code that takes a bunch of tasks and distributes them to either different threads or different nodes on a cluster. I always end up writing a main script driver.py
, that takes two command line arguments: --run-all
and --run-task
. The first is just a wrapper that iterates through all tasks and then calls driver.py --run-task
with each task passed as argument. Example:
== driver.py ==
# Determine the current script
DRIVER = os.path.abspath(__file__)
(opts, args) = parser.parse_args()
if opts.run_all is not None:
# Run all tasks
for task in opts.run_all.split(","):
# Call driver.py again with a specific task
cmd = "python %s --run-task %s" %(DRIVER, task)
# Execute on system
distribute_cmd(cmd)
elif opts.run_task is not None:
# Run on an individual task
# code here for processing a task...
The user would then call:
$ driver.py --run-all task1,task2,task3,task4
And each task would get distributed.
The function distribute_cmd
takes a shell executable command and sends in a system-specific way to either a node or a thread. The reason driver.py
has to find its own name and call itself is because distribute_cmd
needs an executable shell command; it cannot take a function name for example.
This consideration led me to this design, of a driver script having two modes and having to call itself. This has two complications: (1) the script has to find out its own path via __file__
and (2) when making this into a Python package, it's unclear where driver.py
should go. It's meant to be an executable scripts, but if I put it in setup.py
's scripts=
, then I will have to find out where the scripts live (see correct way to find scripts directory from setup.py in Python distutils?). This does not seem to be a good solution.
What's an alternative design to this? Keep in mind that the distribution of tasks has to result in an executable command that can be passed as a string to distribute_cmd
. thanks.