2

I have a python script that reads many executables written and compiled in C program. There is no issue with these executables. However, When I have to run these executable in for loop, i tried to parallize the loop.

Note: prog1,prog2,prog3 must run in order.
This is a sample example, but in my real code
prog2 depends on output of prog1, and prog3 
depends on output of prog2 and so on.
I have seven executables in for loop of iteration 20,
it takes more than 2 hour to complete the process.
If i could parallize the code, it would save a lot of time.
Help would be greatly appreciated!!!!

In my code example 1 runs fine but example 2 doesnot run. The full code is presented below:

#!/usr/bin/python

from multiprocessing import Pool
import os, sys, subprocess, math, re, shutil,copy

#function to run a program and write output to the shell
################################################################################
def run_process(name, args,):
print "--------------------------------------------------------------------"
print "Running: %s"%name
print "Command:"
for arg in args:
    print arg,
print ""
print "--------------------------------------------------------------------"
process = subprocess.Popen(args)

process.communicate()
if process.returncode != 0:
    print "Error: %s did not terminate correctly. Return code: %i."%(name, process.returncode)
    sys.exit(1)  # this will exit the code in case of error
###########################       
# example 1
#run_process("prog1.c", ['./prog1'])
#run_process("prog2.c", ['./prog2'])        
#run_process("prog3.c", ['./prog3', 'first argument'])


# example 2 (parallizing)
commands = []
for x in range(0,20):
    commands.extend(("prog1.c",['./prog1']))
    commands.extend(("prog2.c",['./prog2']))
    commands.extend(("prog3.c",['./prog3', 'first argument']))


p = Pool()
p.map(run_process, commands)

Here, if i run example 1 it runs flawlessly. But when i try to run example 2, it gives following error:

    TypeError: run_process() takes exactly 2 arguments (1 given)

Further note:
To create the executables prog1,prog2,and prog3 I wrote C codes.
Which looks like this:

// to compile: gcc -o prog1 prog1.c
// to run : ./prog1
#include <stdio.h>
int main() {
printf("This is program 1\n");
return 0; }

prog2 looks exactly same. And prog3 looks like this:

//to compile: gcc -o prog3 prog3.c 
//to run: ./prog3 'argument1'
#include <stdio.h>
int main(int argc, char ** argv) {
printf("This is program 3\n");
printf("The argument is = %s\n", argv[1]);  
return 0; }

Now, there are 21 iterations inside the for loop.
In the first iteration it suppose it runs executables prog1,prog2....,prog7
and finally produce ouptput1.fits.
In the second interation it again run seven executables in order and produces output2.fits.
And finally it creates 21 fits files. What I can do is make four functions:
func1 for loop 0 to 5
fucn2 for loop 5 to 10
func3 for loop 11 to 15
func4 for loop 16 to 21
Then I want to run these four functions in parallel process.
My Question is : How can I run example 2 without any error?

BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
  • [This](http://stackoverflow.com/questions/7207309/python-how-can-i-run-python-functions-in-parallel) might be what you are looking for – SirParselot Nov 17 '15 at 19:22
  • You'll normally use a process pool and multiprocessing.map to achieve parallelism while not bringing the system down.. – thebjorn Nov 17 '15 at 19:47
  • 1
    Do these things you are running depend on each other? Is this `run_process` function running subprocesses? – tdelaney Nov 18 '15 at 15:26
  • 1
    If you read the Process documentation, you'd see that you need to call start on the process objects. – noxdafox Nov 20 '15 at 12:18
  • @tdelaney the processes depend on eachother, and run_process is a function defined above in the edited question. – BhishanPoudel Dec 09 '15 at 02:24
  • @thebjorn can you give a look at the code and find the source of the error, that will be helpful. – BhishanPoudel Dec 09 '15 at 02:25

3 Answers3

4

Python has a Pool of processes built exactly for this purpose.

Given the fact you need to run X times the same sequence of commands and supposing the sequence of commands can be run in parallel. This means the Nth run can be run together with the Nth+1 without any implication.

from multiprocessing import Pool

commands = tuple(("prog1.c",['./prog1']), ...)

def run_processes(execution_index):
    print("Running sequence for the %d time." % execution_index)

    for command in commands:
        process = subprocess.Popen(command)
        ...

p = Pool()
p.map(run_processes, range(20))

On Pyhton3 you can use the ProcessExecutor.

Whenever you want to run something concurrently you need to understand the execution boundaries first. If two lines of execution are interdependent, you either set up a communication between the two (using for example a pipe) or avoid running them concurrently.

In your case, the commands are interdependent so it becomes problematic to run them concurrently. But if the whole sequence is not interdependent then you can run those in parallel.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • I tried this with a simple executable as mentioned above, and this did not worked! Do you have any idea? – BhishanPoudel Nov 19 '15 at 20:39
  • 1
    I don't understand. Are you using the `multiprocessing.Pool.map` or the `multiprocessing.Process`? – noxdafox Nov 20 '15 at 12:19
  • I tried I tried this: but it doesnot work. It says it needs two arguments for run_process. my function run_process is given above in question 'from multiprocessing import Pool commands = [] for x in range(20): commands.extend(("prog1",['./prog1'])) p = Pool() p.map(run_process, commands)' – BhishanPoudel Dec 05 '15 at 03:00
  • 1
    The [map](https://docs.python.org/2/library/functions.html#map) function passes only one argument to the given function. In your case is a tuple with your two arguments. You can either expand with the `*` operator or just treat it as a list. – noxdafox Dec 05 '15 at 10:10
  • 1
    You didn't get what I mean. The `run_process` will receive only one argument, containing a tuple with the two entries you specified. I'll edit the answer to better show what I mean. – noxdafox Dec 07 '15 at 06:42
  • 1
    Did you managfe to get it working? If not, could you please edit your question showing the code and the exception you get when running it? – noxdafox Dec 08 '15 at 08:52
  • I still get the error. My full code is given in the updated question. my error looks like this: TypeError: run_process() takes exactly 2 arguments (1 given) – BhishanPoudel Dec 09 '15 at 00:18
  • The solution is already in the post. Check the run_process function signature. – noxdafox Dec 13 '15 at 21:11
  • the edited function run_process provided by you does not for executables that need arguments (eg. prog3 in my question).....................Also, commands=[] , commands.extend.........then p=Pool() p.map(run_process,commands) gives garbage answer: the output was like this:.......................Running:p command: r – BhishanPoudel Dec 16 '15 at 20:01
  • Just saw now your new note. You can not run in parallel a sequential computation. Your issue is not on the Python side but on your programs. If progr2 must be started after progr1 has ended how could you make them run in parallel? Concurrency only works with [embarassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) type of problems. Therefore progr1 and progr2 need to be able to run in any order for you to run them in parallel. – noxdafox Dec 17 '15 at 11:13
  • i have to run prog1,prog2,prog3,...,prog7 20 times inside a for loop, first iteration gives output oputput1.fits, iteration2 results output2.fits and .... output21.fits at end of each iteration. can i make four for-loops such as 0 to 5, 5 to 10, 10 to 15, and 15 to 20 and define four separate functions, then parallize the code? – BhishanPoudel Dec 17 '15 at 20:34
  • Another problem: even if i just want to parallize there programs the above function to run executables doesnot work for ./prog3 "any argument", There is no argument[2] option, how can we solve that issue? – BhishanPoudel Dec 17 '15 at 21:40
  • My bad. I misunderstood. Editing once again. – noxdafox Jan 05 '16 at 09:27
1
import multiprocessing
for x in range(0,20): 
    multiprocessing.Process(target=run_process, args=("colour.c",['./cl',"color.txt",str(x) ])  
    ...

not really sure what else I could add ...

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
1

Have a look at what group functions of Celery's canvas do. They allow you to call functions at the same time, with different set of arguments. Say you want to process a total of 1000 elements in your for loop. Doing the same sequentially is highly unoptimized. A simple solution will be to call the same function with two sets of arguments. Even this simple hack will bring down your processing time down by half. That is what Canvas and Celery are about.

user2058724
  • 125
  • 1
  • 2
  • 8