0

So, I have a python program that I wrote which runs as desired on my Linux OS. I have started the process of making it run on Windows; however, there is one part that I cannot figure out how to make work.

The program is structured to be highly parallelizable by having a parent process and lots of child processes and I launch the whole thing with a bash script:

e.g.

# set up some necessary folder stuff
... 

python -u parent.py --exp_name $arg --args_file $fname > ./../results_$arg/logs/master.log 2>&1 &

sleep 1

for((i=0; i<10; i++)); do
    OMP_NUM_THREADS=1 python -u child.py --id $i --exp_name $arg --args_file $fname > ./../results_$arg/logs/w$i.log 2>&1 &
    sleep 1
done

wait

Now for the weird part. The parallel child.py scripts, over time, will consume quite a few of the systems resources, therefore I have things set up such that every so often, the parent program will send a signal to the child processes for them to die and restart. I do this in the following way:

while not done:
    try: #This try except block handles a manual control-c kill.

        while not child.hasTask():
            # if you're waiting for a task and 
            # your alive flag is removed by the parent
            # kill yourself. 
            if not os.path.exists(child.alive) or os.path.exists(child.alive + '.cycle'):
                done = True
                break
...

if os.path.exists(child.alive + '.done'):
    print("task completely finished")

else:
    if os.path.exists(child.alive + '.cycle'):
        os.remove(child.alive + '.cycle')
    print(f"refreshing worker {child.id}")
    os.system(f'bash refreshWorker.sh {line_args.exp_name} {line_args.args_file} {line_args.id}')

# END OF SCRIPT

The refreshWorker bash script simply launches a new python process with the same parameters as the one that just finished. OMP_NUM_THREADS=1 python -u child.py --id $i --exp_name $arg --args_file $fname > ./../results_$arg/logs/w$i.log 2>&1 &

This all works.


I've been playing around with windows, and am finding that I cannot replicate this structure easily. For example, simply changing the bash scripts to command scripts via:

START /B "" python -u child.py {...insert args} and then have the child script call os.system("cmd refreshWorkers.cmd {...insert args}.

is not working in small test cases I have (i.e. foo.cmd --> bar.py (die) --> baz.cmd (get revived) --> bar.py (die for real))


If the actual files would be helpful:

This is the launch point.

This is the child/worker program where it calls the refresh script and then dies.

This is the bash script that launches a just-killed worker script.

aadharna
  • 45
  • 5
  • Just a thought: You use bash and cmd while it is more convenient to do it in Python itself. Try to use `subprocess` module [and check other answers like this](https://stackoverflow.com/questions/43088987/python-parent-process-is-not-catching-sigterm-sigint-signals-when-launching-subp) – viilpe Jan 16 '21 at 17:11
  • That was my first method of doing this. Using `subprocess` will hold the program which called the subprocess until the subprocess finishes, whereupon control will return to the main process to then finish. I need the main process to send a command to start another copy of the itself and then immediately die. Hence bash --> python --> bash --> python. The python program will get returned to when the bash script finishes and therefore will itself die. But the bash script will background a new worker. This allows the system resources to be released. – aadharna Jan 16 '21 at 22:14
  • `Popen` is non blocking, see [here](https://stackoverflow.com/questions/16071866/non-blocking-subprocess-call). And you can always start a new thread with 2 lines of code. – viilpe Jan 16 '21 at 22:44

0 Answers0