0

I'm trying to use multiprocessing to speed up pandas excel reading. However when I use multiprocessing I'm getting the error cPickle.PicklingError: Can't pickle : attribute lookup __builtin__.function failed

when I try to run the following: import dill from pathos.multiprocessing import ProcessPool

class A(object):
    def __init__(self):
        self.files = glob.glob(\*)

    def read_file(self, filename):
        return pd.read_excel(filename)

    def file_data(self):
        pool = ProcessPool(9)
        file_list = [filename for filename in self.files]
        df_list = pool.map(A().read_file, file_list)
        combined_df = pd.concat(df_list, ignore_index=True)

Isn't pathos.multiprocessing designed to fix this issue? Am I overlooking something here?

Edit: Full error code traces to

File "c:\users\zky3sse\appdata\local\continuum\anaconda2\lib\site-packages\pathos-0.2.0-py2.7.egg\
pathos\multiprocessing.py", line 136, in map
return _pool.map(star(f), zip(*args)) # chunksize
  File "C:\Users\ZKY3SSE\AppData\Local\Continuum\Anaconda2\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
  File "C:\Users\ZKY3SSE\AppData\Local\Continuum\Anaconda2\lib\multiprocessing\pool.py", line 567, in get
raise self._value
boson
  • 886
  • 1
  • 12
  • 25
  • It looks like you are on windows, and if you didn't use freeze_support, you can get this cryptic error. If you can post some test code that demonstrates your error that other people can run easily, you might get a more complete answer demonstrating working code. – Mike McKerns Nov 17 '16 at 10:41

1 Answers1

2

It is possible that Pandas may be using Swig as a wrapper for C code. If this is the case, then dill may not work properly, and pathos would then switch to pickle. There are workarounds, as shown here: How to make my SWIG extension module work with Pickle?

Community
  • 1
  • 1
MauricioRoman
  • 832
  • 1
  • 9
  • 15