4

I have a code which given two parameters, (k, m) will return a 4d numpy array, my requirement is that I need to calculate this array for possible values of (k,m) with k,m < N and add them up. This is slow in serial so I am trying to learn the multiprocessing module in python to do it. https://docs.python.org/2/library/multiprocessing.html

Essentially I want to use my 8 cores to parallely compute these 4d arrays and add them all up. Now the question is how to design this. Each array can be around 100 MB and N around 20. So storing 20**2 * 100 MB in a queue is not possible. The solution would be to have a shared memory object, a result array which each process will keep adding the results into.

multiprocessing has two means for doing this, using shared memory or a server process. Neither of them seem to support mutlidim arrays. Can anyone suggest a way to implement my program? Thx in advance.

user231042
  • 63
  • 2
  • 1
    [`numpy.memmap()`](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.memmap.html) might help you (or not). Have a look here http://stackoverflow.com/questions/17785275/share-large-read-only-numpy-array-between-multiprocessing-processes for pointers. – Ilja Everilä Jun 06 '16 at 19:07
  • Possible duplicate of [Use numpy array in shared memory for multiprocessing](http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing) – pppery Jun 06 '16 at 19:41

1 Answers1

1

One approach would be to create memory mapped arrays in the parent process, and pass them to the children to fill. Additionally you should probably have a multiprocessing.Event for every mapped array, so the chld process can signal to the parent that an array is done.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94