20

I am trying to use Python's pathos to designate computations into separate processes in order to accelerate it with multicore processor. My code is organized like:

class:
   def foo(self,name):
    ...
    setattr(self,name,something)
    ...
   def boo(self):
      for name in list:
         self.foo(name)

As I had pickling problems with multiprocessing.Pool, I decided to try pathos. I tried, as suggested in previous topics:

import pathos.multiprocessing

but it resulted in error: No module multiprocessing - which I can't find in latest pathos version.

Then I tried modify boo method:

def boo(self):
 import pathos
 pathos.pp_map.pp_map(self.foo,list)

Now there is no error thrown, but foo does not work - instance of my class has no new attributes. Please help me, because I have no idea where to move next, after a day spent on that.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
user3708829
  • 199
  • 1
  • 1
  • 4

2 Answers2

42

I'm the pathos author. I'm not sure what you want to do from your code above. However, I can maybe shed some light. Here's some similar code:

>>> from pathos.multiprocessing import ProcessingPool
>>> class Bar:
...   def foo(self, name):
...     return len(str(name))
...   def boo(self, things):
...     for thing in things:
...       self.sum += self.foo(thing)
...     return self.sum
...   sum = 0
... 
>>> b = Bar()
>>> results = ProcessingPool().map(b.boo, [[12,3,456],[8,9,10],['a','b','cde']])
>>> results
[6, 4, 5]
>>> b.sum
0

So what happens above, is that the boo method of the Bar instance b is called where b.boo is passed to a new python process, and then evaluated for each of the nested lists. You can see that the results are correct… len("12")+len("3")+len("456") is 6, and so on.

However, you can also see that when you look at b.sum, it's mysteriously still 0. Why is b.sum still zero? Well, what multiprocessing (and thus also pathos.multiprocessing) does, is make a COPY of whatever you pass through the map to the other python process… and then the copied instance is then called (in parallel) and return whatever results are called by the method invoked. Note you have to RETURN results, or print them, or log them, or send them to a file, or otherwise. They can't go back to the original instance as you might expect, because it's not the original instance that's sent over to the other processors. The copies of the instance are created, then disposed of -- each of them had their sum attribute increased, but the original `b.sum' is untouched.

There is however, plans within pathos to make something like the above work as you might expect -- where the original object IS updated, but it doesn't work like that yet.

EDIT: If you are installing with pip, note that the latest released version of pathos is several years old, and may not install correctly, or may not install all of the submodules. A new pathos release is pending, but until then, it's better to get the latest version of the code from github, and install from there. The trunk is for the most part stable under development. I think your issue may have been that not all packages were installed, due to a "new" pip -- "old" pathos incompatibility in the install. If pathos.multiprocessing is missing, this is the most likely culprit.

Get pathos from github here: https://github.com/uqfoundation/pathos

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • I have the same problem as OP here. I can do `import pathos`, but `import pathos.multiprocessing` gives me a module not found error. What might be the reason for that? – sashkello Nov 04 '14 at 22:33
  • The problem is, I don't understand the OP's question… due to some barrier in the english and also the minimal code samples. Maybe I can try another approach. Maybe all of the dependencies were not installed. Can you `import processing`? How about `from processing.pool import Pool`? How about `from pathos.helpers import mp_helper` or `from pathos.helpers import ProcessPool`? What about `import pp` and `from pathos.helpers import pp_helper`? – Mike McKerns Nov 05 '14 at 01:06
  • `from pathos.helpers import *` gives "No module named helpers". Somehow not all of pathos is available for me and it seems OP. I installed it from pip, it is the latest version. – sashkello Nov 05 '14 at 02:09
  • In the package, I have core, hosts, Launcher, LauncherSCP, LauncherSSH, pp_map, Server, Tunnel, util, XMLRPCRequestHandler, XMLRPCServer. That's it, no helpers, no multiprocessing. – sashkello Nov 05 '14 at 02:15
  • I don't see multiprocessing in the tgz file I downloaded from official site, but it is present in git. I'll try reinstalling it from there... – sashkello Nov 05 '14 at 02:21
  • Yes, now it works. For some reason the official tgz is missing some of the submodules. Installing from git worked for me. – sashkello Nov 05 '14 at 02:23
  • Same, had to install from the git repo and not the tgz file. – Brideau Jan 23 '15 at 20:03
  • 2
    @Brideau: I'm in the midst of splintering `pathos` into a few more packages (basically all the nonstandard dependencies), in order to make sure everything is `pip` installable. New releases should be coming soon. – Mike McKerns Jan 24 '15 at 15:47
  • update: everything is `pip` installable and has been for a while. – Mike McKerns Mar 14 '17 at 08:34
  • Hi @MikeMcKerns, I wonder what is the difference between ParallelPool and ProcessingPool? If they are different, in what occasions I should use any one of them? Thanks in advance. – LifeWorks Jul 27 '17 at 01:37
  • The difference is that `ProcessingPool` uses `multiprocess` and `ParallelPool` uses `ppft`. The former is for process-parallel computing, while the latter can be used across a network connection. – Mike McKerns Jul 27 '17 at 01:46
0

Here's how I go about this - I put the function to be run in parallel outside the class and pass the object as an arg while calling pool.map. Then, I return the object to be reassigned.

from pathos.multiprocessing import ProcessingPool


def boo(args):
    b, things = args
    for thing in things:
        b.sum += b.foo(thing)
    return [b, b.sum]

class Bar:
    def __init__(self):
       self.sum = 0
    def foo(self, name):
       return len(str(name))

pool = ProcessingPool(2)
b1 = Bar()
b2 = Bar()
print(b1, b2)

results = pool.map(boo, [[b1, [12,3,456]],[b2, ['a','b','cde']]])

b1, b1s = results[0]
b2, b2s = results[1]
print(b1,b1s,b1.sum)
print(b2, b2s, b2.sum)

Output:

(<__main__.Bar instance at 0x10b341518>, <__main__.Bar instance at 0x10b341560>)
(<__main__.Bar instance at 0x10b3504d0>, 6, 6)
(<__main__.Bar instance at 0x10b350560>, 5, 5)

Note that b1 and b2 are no longer the same as what they were before calling map because copies of them were made to be passed, as described by @Mike McKerns. However, the values of all their attributes are intact because they were passed, returned and reassigned.

madvn
  • 33
  • 7