1
import pathos.multiprocessing as mp
class Model_Output_File():
    """
    Class to read Model Output files
    """
    def __init__(self, ftype = ''):
        """
        Constructor
        """
        # Create a sqlite database in the analysis directory
        self.db_name = 'sqlite:///' + constants.anly_dir + os.sep + ftype + '_' + '.db'
        self.engine  = create_engine(self.db_name)
        self.ftype   = ftype

    def parse_DGN(self, fl):
        df      = pandas.read_csv(...)
        df.to_sql(self.db_name, self.engine, if_exists='append')

    def collect_epic_output(self, fls):
        pool = mp.ProcessingPool(4)
        if(self.ftype == 'DGN'):
            pool.map(self.parse_DGN, fls)
        else:
            logging.info( 'Wrong file type')

if __name__ == '__main__':
    list_fls = fnmatch.filter(...)
    obj = Model_Output_File(ftype = 'DGN')
    obj.collect_model_output(list_fls)

In the code above, I am using the pathos multiprocessing library to avoid python multiprocessing issues with classes. However I am getting a pickling error:

  pool.map(self.parse_DGN, fls)
  File "C:\Anaconda64\lib\site-packages\pathos-0.2a1.dev0-py2.7.egg\pathos\multiprocessing.py", line 131, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "C:\Anaconda64\lib\multiprocessing\pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "C:\Anaconda64\lib\multiprocessing\pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

How do I fix this?

user308827
  • 21,227
  • 87
  • 254
  • 417
  • I'm the `pathos` author. You are getting a `cPickle.PicklingError`… which you should not get with `pathos`. Do you have `multiprocess` installed, and if you do, do you have a C compiler? You can check for pickling errors by importing `dill`, and doing a `dill.copy` on the object. If that works, then you probably have some installation issue, where `pathos` is finding the standard library version of `multiprocessing` and not the fork that provides better serialization. – Mike McKerns Oct 01 '15 at 01:10
  • thanks @MikeMcKerns, I do have multiprocessing installed. What exactly do I do dill.copy on? – user308827 Oct 01 '15 at 01:25
  • From the error message, I seem to be ending up in "C:\Anaconda64\lib\multiprocessing\pool.py", which seems to indicate what you are saying about pathos finding the standard library version of multiprocessing. – user308827 Oct 01 '15 at 01:32
  • 1
    You'd `dill.copy(self.parse_DGN)`. So if you are finding the python standard library `multiprocessing`, then you probably need to install a compiler… like Microsoft Visual Studio Community. See: https://github.com/mmckerns/tuthpc – Mike McKerns Oct 01 '15 at 01:54
  • hmm, dill.copy(...) works, and error is still persisting after install NS Visual Studio Express... – user308827 Oct 01 '15 at 04:12
  • 1
    did you then rebuild `multiprocess` after the install of the MS compiler? – Mike McKerns Oct 01 '15 at 06:56
  • ah, I did not. ok will try that – user308827 Oct 01 '15 at 13:52
  • @MikeMcKerns, I did a pip uninstall multiprocess and then a pip install multiprocess, that does not seem to help. Maybe I should build from source. However, I just cannot seem to find multiprocess code. Do you know where it is? – user308827 Oct 01 '15 at 14:00
  • ok did pip install git+https://github.com/uqfoundation/multiprocess.git. still getting same issue :( – user308827 Oct 01 '15 at 14:29
  • 1
    Ok, that seems odd. `multiprocess` should just `pip install` with no problems. Maybe you can post this as a ticket on the `multiprocess` github page? That way, I can better diagnose your traceback? The only other thing I can think of w/o seeing tracebacks / build output is that you have an unusual PYTHONPATH. – Mike McKerns Oct 01 '15 at 14:44
  • thank you so much @MikeMcKerns, turns out I was wrong re the C compiler. Just tried building multiprocess from source code (setup.py build), turns out it complains about a C compiler not being installed. I did install visual studio community yesterday so maybe it is a case of some path being wrong as you said – user308827 Oct 01 '15 at 14:47
  • 1
    Cool. Enjoy! If you find you have further issues, feel free to submit a ticket. – Mike McKerns Oct 01 '15 at 15:22
  • @MikeMcKerns, if you can write your comment as an answer, i will be happy to accept. – user308827 Oct 01 '15 at 20:30

2 Answers2

3

I'm the pathos author. You are getting a cPickle.PicklingError… which you should not get with pathos. Make sure you have multiprocess installed, and if you do, that you have a C++ compiler. You can check for pickling errors by importing dill, and doing a dill.copy(self.parse_DGN) inside your class, or externally using the instance of the class. If that works, then you probably have some installation issue, where pathos is finding the python standard library multiprocessing. If so, then you probably need to install a compiler… like Microsoft Visual Studio Community. See: github.com/mmckerns/tuthpc. Make sure to rebuild multiprocess after the install of the MS compiler.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
3

I encountered the same problem. Mystery is that the same identical code works on 1 win7 machine and not another win7! Then I checked the versions --- turned out dill and multiprocess were 1 version higher on the balky machine. I down-versioned dill and multiprocess to 0.2.5 and 0.70.4 respectively. And that solved the problem! Hope that helps

realpy
  • 31
  • 3
  • this seems a bit odd. When you down-versioned, did you also have an older version of python? Because `multiprocess` is usually kept in step with the latest releases of `python`, as I essentially fork `multiprocessing` each new release. If there's some incompatibility across minor versions of `python`, I'd be interested to hear about it. – Mike McKerns Feb 27 '17 at 20:14