I am using Python's multiprocessing
module to process large numpy arrays in parallel. The arrays are memory-mapped using numpy.load(mmap_mode='r')
in the master process. After that, multiprocessing.Pool()
forks the process (I presume).
Everything seems to work fine, except I am getting lines like:
AttributeError("'NoneType' object has no attribute 'tell'",)
in `<bound method memmap.__del__ of
memmap([ 0.57735026, 0.57735026, 0.57735026, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], dtype=float32)>`
ignored
in the unittest logs. The tests pass fine, nevertheless.
Any idea what's going on there?
Using Python 2.7.2, OS X, NumPy 1.6.1.
UPDATE:
After some debugging, I hunted down the cause to a code path that was using a (small slice of) this memory-mapped numpy array as input to a Pool.imap
call.
Apparently the "issue" is with the way multiprocessing.Pool.imap
passes its input to the new processes: it uses pickle. This doesn't work with mmap
ed numpy arrays, and something inside breaks which leads to the error.
I found this reply by Robert Kern which seems to address the same issue. He suggests creating a special code path for when the imap
input comes from a memory-mapped array: memory-mapping the same array manually in the spawned process.
This would be so complicated and ugly that I'd rather live with the error and the extra memory copies. Is there any other way that would be lighter on modifying existing code?