2

I have a number of instances of a class, and I'd like to modify each of them by calling a method. The method will call a system command, however, that takes a while to return, so I'd like to do several in parallel. I thought this would be a very simple thing to do, but I am stumped. Here is an example that is analogous to what I want to accomplish:

import os

class SquareMe():
    def __init__(self, x):
        self.val = x
    def square(self):
        os.system('sleep 10')  # I'll call a slow program
        self.val = self.val **2

if __name__ == '__main__': 
    objs = [SquareMe(x) for x in range(4)]
    for obj in objs:
        obj.square()  # this is where I want to parallelize it
    print([obj.val for obj in objs])

This code works (it prints [0, 1, 4, 9]), but it takes 40 seconds to run. I'm trying to get that down to roughly 10 seconds.

I've looked at the subprocess and multiprocessing modules but, as I understand it, they will pickle the object before evaluating square(), meaning the copy of each object will be modified, and the originals will remain untouched. I thought the solution would be to return self and overwrite the original, but that is far from straightforward, if it is possible. I looked into threading and got the impression that I would run into the same problem there.

I also looked into using async (I'm using python 3.5). From what I can tell, system calls (including 'sleep 10') are blocking, meaning that async will not speed it up.

Is there a simple, elegant, pythonic way to do this?

will.r
  • 21
  • 2
  • I'd expect you could find a viable solution with twisted (https://twistedmatrix.com/trac/) but am not sure so am not posting it as an answer. – Frank V Apr 14 '17 at 20:35
  • I will definitely be following this question because I am often stymied by the GIL, the fact that running Python and Windows doesn't allow fork, along with of course the fact that the objects are serealized in multi-processing. Don't get me started on the problems with multi-processing and shared numpy arrays. With computationally intensive stuff I often have to go to Cython, which can multi-process through open MP. – Trekkie Apr 14 '17 at 21:14
  • Can you divide up the object modification into two parts, one of which is expensive but doesn't need to be tightly tied to the object (e.g. the call to outside code), and another which is cheap and which does the actual object changes? If so, you could probably parallelize the expensive part with `multiprocessing` and keep only the cheaper part in the main process. If there's no easy way to separate out the parts (e.g. because updating the object's data or extracting the arguments to the outside code is the expensive part), I don't think there's a good solution. – Blckknght Apr 14 '17 at 23:00

1 Answers1

0

Turns out it is pretty straightforward to use multiprocessing and 'return self', once you find the right code. Here's what I ended up with:

import os
import multiprocessing

class SquareMe():
    def __init__(self, x):
        self.val = x
    def square(self):
        os.system('sleep 10')  # I'll call another program, which take a while
        self.val = self.val **2
        return self

if __name__ == '__main__':
    objs = [SquareMe(x) for x in range(4)]
    pool = multiprocessing.Pool(4)
    objs = pool.map(SquareMe.square, objs)
    pool.close()
    pool.join()
    print([obj.val for obj in objs])
will.r
  • 21
  • 2
  • Your code gives **_pickle.PicklingError: Can't pickle : attribute lookup square on __main__ failed** – stovfl Apr 20 '17 at 14:47