0

I want to fit many distributions with scipy and would like to use some sort of multiprocessing for this. Something like this:

import scipy.stats as ss
from pathos.multiprocessing import ProcessingPool
from multiprocessing import Pool

mp = Pool()
pp = ProcessingPool()

l = [0,1,2,3,4,6,7,8,9]
print map(ss.lognorm.fit,l) #method 0
print mp.map(ss.lognorm.fit,l) #method 1
print pp.map(ss.lognorm.fit,l) #method 2

Method 0 is of course not multiprocessing, but works. Method 1 and 2 both return with long tracebacks. Does anybody have a workaround for this?

Method 1 error:

Process PoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
Process PoolWorker-2:
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
Process PoolWorker-4:
    return recv()
Traceback (most recent call last):
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Process PoolWorker-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Process PoolWorker-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Process PoolWorker-6:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Process PoolWorker-7:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Process PoolWorker-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))
Process PoolWorker-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
AttributeError: ("'lognorm_gen' object has no attribute '_parse_args'", <built-in function getattr>, (<scipy.stats._continuous_distns.lognorm_gen object at 0x7fb15349ddd0>, '_parse_args'))

Method 2 error:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing/pool.py", line 207, in _handleTasks
    put(task)
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 192, in dumps
    dump(obj, file, protocol, byref, fmode)#, strictio)
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 182, in dump
    pik.dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 562, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 626, in save_function
    obj.__dict__), obj=obj)
  File "/usr/lib/python2.7/pickle.py", line 401, in save_reduce
    save(args)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 562, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 826, in save_cell
    pickler.save_reduce(_create_cell, (obj.cell_contents,), obj=obj)
  File "/usr/lib/python2.7/pickle.py", line 401, in save_reduce
    save(args)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 794, in save_instancemethod0
    obj.im_class), obj=obj)
  File "/usr/lib/python2.7/pickle.py", line 401, in save_reduce
    save(args)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 658, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 794, in save_instancemethod0
    obj.im_class), obj=obj)
  File "/usr/lib/python2.7/pickle.py", line 401, in save_reduce
    save(args)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 615, in save_function
    if not _locate_function(obj): #, pickler._session):
  File "/usr/local/lib/python2.7/dist-packages/dill-0.2.2-py2.7.egg/dill/dill.py", line 604, in _locate_function
    found = _import_module(obj.__module__ + '.' + obj.__name__, safe=True)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
fbence
  • 2,025
  • 2
  • 19
  • 42
  • Are you sure you problem isn't coming from [here](http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma) - using bound instance methods, which cannot be pickled? – fnl Jan 16 '15 at 23:38
  • I don't think so, or at least, not for method #2. The whole point of spending at least an hour in trying to install pathos.multiprocessing was because it doesn't use pickle to serialize and supposedly, doesn't have problems like that. – fbence Jan 17 '15 at 00:01

1 Answers1

2

Method 1 doesn't work because you can't pickle bound instance methods with pickle. Method 2 doesn't work because scipy.stats is doing something "tricky"… something that the dill and pathos author (me) doesn't quite know what it is without first investigating.

You can see the issue is not that scipy.stats is using a bound method (not a problem for dill or pathos), but it's doing some renaming magic… which is why you when you look in the traceback from your pathos call, you see _locate_function failing (it fails and finds None)… and this is actually why Method 2 doesn't work.

>>> import scipy.stats as ss
>>>        
>>> ss.lognorm
<scipy.stats._continuous_distns.lognorm_gen object at 0x10932d6d0>

The workaround is simple. Let the method be found easier by making a function that knows where it is.

>>> import pathos.multiprocessing as mp
>>> p = mp.ProcessingPool()
>>>        
>>> def doit(x):
...   return ss.lognorm.fit(x)
... 
>>> p.map(doit, range(5))
[(1.0, 0.0, 1.0), (1.0, 1.0, 1.0), (1.0, 2.0, 1.0), (1.0, 3.0, 1.0), (1.0, 4.0, 1.0)]
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139