multi threading in subprocess module of python

Question

I have several Unix servers which has an application running on them, I need to grep some pattern on each server from application logs and put the grep result of all the servers into a single consolidated file.

This is how i'm currently doing it.

def run_command(command):
   ps = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,shell=True)
   out,err = ps.communicate()
   if err != "":
     return err
   else:
     return out



Server_List = [['ServerA','BecomeAccountA'],['ServerB','BecomeAccountB'],['ServerC','BecomeAccountC'],['ServerD','BecomeAccountD']]
Final_Result = ""
path = "some/path/"
pattern = "FindMe"
for list in Server_List:
    server= list[0]
    becomeaccount = list[1]
    command="ssh -oConnectTimeout=5 -oBatchMode=yes -l %s %s 'grep %s %s'" % (becomeaccount,server,pattern,path)
    result = run_command(command)
    Final_Result+=result

with open("/some/path/output",'w') as f:
f.write(Final_Result)

Now my output file contains following contents:

14012015.1449.30 [INFO] something FindMe something
14012015.1449.40 [INFO] something FindMe something
14012015.1450.13 [INFO] something FindMe something
14012015.1450.48 [INFO] something FindMe something
14012015.1451.04 [INFO] something FindMe something
14012015.1451.19 [INFO] something FindMe something
14012015.1451.77 [INFO] something FindMe something
14012015.1452.09 [INFO] something FindMe something

To get to this result in output file, i have to make ssh connections to all the servers one after the other which takes sometime to process. I need to reduce the time taken by my code in order to get the final output. I was wondering can i do this in multi threading? I mean making multiple ssh connections at a time? I have never tried mutli threading.

Note:- the order of the lines in output file is not important, so the order of ssh connections is also not required, because i can always sort the lines in output file with the time as it has timestamp at the beginning of each line.

It sounds like what you're doing might be io-bound, so multi-threading sounds like it could be helpful. However, all it might accomplish is allow you to wait for all the servers in parallel. — martineau, Jan 14 '15 at 10:06
unrelated: you could use the exit status `ps.returncode != 0` as an error indicator. If you want to check whether a string `err` is not empty then use `if err` instead of `if err != ""` (the latter fails on Python 3 where `bytes` and `str` are different types) and it is not idiomatic on Python 2 too. — jfs, Jan 14 '15 at 11:51

score 1 · Answer 1 · answered Jan 14 '15 at 10:36

I think ps.communicate() will wait until all the output of the process is read. This makes the your program sequential.

As you mentioned, it may be better to spawn threads, where each thread invokes one subprocess and handles the reading of the process output/error.

When collecting the outputs, you need to put them in a queue or list that allows parallel access, for example see the Queue module.

At the end you also need to "join" the thread, ie. wait for all threads to terminate.

multi threading in subprocess module of python

1 Answers1