1

Trying to use the code here https://stackoverflow.com/a/15390953/378594 to convert a numpy array into a shared memory array and back. Running the following code:

shared_array = shmarray.ndarray_to_shm(my_numpy_array)

and then passing the shared_array as an argument in the list of argument for a multiprocessing pool:

pool.map(my_function, list_of_args_arrays)

Where list_of_args_arrays contains my shared array and other arguments.

It results in the following error

PicklingError: Can't pickle <class 'multiprocessing.sharedctypes.c_double_Array_<array size>'>: attribute lookup multiprocessing.sharedctypes.c_double_Array_<array size> failed

Where <array_size> is the linear size of my numpy array.

I guess something has changed in numpy ctypes or something like that?

Further details:

I only need access to shared information. No editing will be done by the processes.

The function that calls the pool lies within a class. The class is initiated and the function is called by a main.py file.

Community
  • 1
  • 1
Uri
  • 25,622
  • 10
  • 45
  • 72

2 Answers2

1

Apparently when using multiprocessing.Pool all arguments are pickled, and so there was no use using multiprocessing.Array. Changing the code so that it uses an array of processes did the trick.

Uri
  • 25,622
  • 10
  • 45
  • 72
0

I think you are overcomplicating things: There is no need to pickle arrays (especially if they are read only):

you just need to do keep them accessible through some global variable:

(known to work in linux, but may not work in windows, don't know)

import numpy as np,multiprocessing as mp
class si:
  arrs=None

def summer(i):
    return si.arrs[i].sum()

def main():
    si.arrs=[np.zeros(100) for _ in range(1000)]
    pool = mp.Pool(16)
    res=pool.map(summer,range(1000))
    print res

if __name__ == '__main__':
    main()

If your arrays need to be read and written, you need to use this: Is shared readonly data copied to different processes for Python multiprocessing?

Community
  • 1
  • 1
sega_sai
  • 8,328
  • 1
  • 29
  • 38
  • This looks good. Can you add how it's supposed to look like if the file is imported and there is no `if __name__ == '__main':` and `main()`. What is the important element here, that the global variable and the function `summer` are in the same scope? Or maybe the definition of the pool and the global variables? – Uri Apr 30 '13 at 16:13
  • The important thing is to have a global resource initialized before mp.Pool(). E.g. main() and '_ _main_ _' could be in other file (say sumfile.py) Just in main() instead of si.arr it will be sumfile.si.arr=[] and instead of pool.map(summer..) it'll be pool.map(sumfile.summer,...) – sega_sai Apr 30 '13 at 16:33
  • Strange, I'm updating the global variables stored in class, and this update is done before calling Pool(), but still the processes spawned behave as though I did not change the global variables at all - they see the default value (i.e. `None`) despite the value is indeed changed in the main program process... – Uri Apr 30 '13 at 22:25
  • Also take a look here http://docs.python.org/2/library/multiprocessing.html#windows. It seems to contradict your example – Uri Apr 30 '13 at 22:27
  • I guess if you are using windows that may not work (their process spawning is different), but I never had windows to try it. – sega_sai May 01 '13 at 00:25