0

While using Python3 on Windows 7 to process some large csv files I have run into a issue with the program not running fast enough. The original working version of the code is similar to below, but the process calls are both threads. Upon adding the multiprocessing library and transferring the tdg.Thread to the mp.Process as it shows below I receive this pickling error:

line 70, in <module>
    proc1.start()
  File "C:\Python34\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Python34\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Python34\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Python34\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Python34\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class '_thread.lock'>: attribute lookup lock on _thread failed

Code:

import multiprocessing as mp
import threading as tdg
import queue as q

def my_p1func1(data, Q):
    #performs LDAP for data set 1
    print("p1f1:",data)


    Q.put(data)

def my_p1func2(data, Q):
    #performs LDAP for data set2
    print("p1f2:",data)

    Q.put(data)

def my_proc1(data, Q):

    f1_Q = q.Queue()
    f2_Q = q.Queue()

    f1 = tdg.Thread(target=myP1Func1, args = (data['1'], f1_Q))
    f2 = tdg.Thread(target=myP1Func2, args = (data['2'], f2_Q))

    f1.start()
    f2.start()

    f1.join()
    f2.join()

    f1_out=f1_Q.get()
    f2_out=f2_Q.get()

    Q.put({'f1':f1_out,'f2':f2_out})

def my_p2func1(data, Q):
    #perform gethostbyaddr() for data set 1
    print("p2f1:",data)

    Q.put(data)

def my_p2func2(data, Q):
    #perform gethostbyaddr() for data set 2
    print("p2f2:",data)

    Q.put(data)

def my_proc2(data, Q):

    f1_Q = q.Queue()
    f2_Q = q.Queue()

    f1 = tdg.Thread(target=myP2Func1, args = (data['1'], f1_Q))
    f2 = tdg.Thread(target=myP2Func2, args = (data['2'], f2_Q))

    f1.start()
    f2.start()

    f1.join()
    f2.join()

    f1_out=f1_Q.get()
    f2_out=f2_Q.get()

    Q.put({'f1':f1_out,'f2':f2_out})

dataIn = {'1': [1,2,3], '2': ['a','b','c']}
pq1 = q.Queue()
pq2 = q.Queue()

proc1 = mp.Process(target=my_proc1, args=(dataIn, pq1))
proc2 = mp.Process(target=my_proc2, args=(dataIn,pq2))

proc1.start()
proc2.start()

proc1.join()
proc2.join()

p1 = pq1.get()
p2 = pq2.get()

print(p1)
print(p2)

I though the issues was being caused by Locks I had around my print statements, but even after removing them it continues to throw the same pickling error.

I am in over my head with this and would appreciate any help understanding why it is attempting to pickle something not in use and how do I get this running so that it is more efficient?

Olsonm76
  • 45
  • 1
  • 4
  • http://stackoverflow.com/questions/7865430/multiprocessing-pool-picklingerror-cant-pickle-type-thread-lock-attribu – Vor Aug 21 '14 at 21:11
  • It actually turned out to be [link]http://stackoverflow.com/questions/3217002, however fixing that in my real code led to a broken pipe error. So now I am off to research how to fix that. – Olsonm76 Aug 21 '14 at 21:20

1 Answers1

0

You can't use a regular Queue.Queue object with multiprocessing. You have to use a multiprocessing.Queue. The standard Queue.Queue won't be shared between the processes, even if you were to make it picklable. It's an easy fix, though:

if __name__ == "__main__":
    dataIn = {'1': [1,2,3], '2': ['a','b','c']}
    pq1 = mp.Queue()
    pq2 = mp.Queue()

    proc1 = mp.Process(target=my_proc1, args=(dataIn, pq1))
    proc2 = mp.Process(target=my_proc2, args=(dataIn, pq2))

    proc1.start()
    proc2.start()

    proc1.join()
    proc2.join()

    p1 = pq1.get()
    p2 = pq2.get()
dano
  • 91,354
  • 19
  • 222
  • 219
  • While this stopped the pickling error, there is no output from the `print` statements. Why would it not print if it moved into the processes appropriately? – Olsonm76 Aug 21 '14 at 21:30
  • @Olsonm76 Just noticed you're on Windows, which means you need the `if __name__ == "__main__":` guard. I added it into my answer. Not sure if that is the only issue remaining or not, though. I'll keep looking. – dano Aug 21 '14 at 21:32
  • I added the guard, but no change. I did however, discover that once this is run that I never return to idle state despite re-obtaining my cursor. – Olsonm76 Aug 21 '14 at 22:13
  • @Olsonm76 Are you running this from IDLE? If so, that's probably why you don't see output from the print statements. IDLE doesn't work properly with `multiprocessing`. Try running your script directly from the CLI. – dano Aug 21 '14 at 22:30