Will multiprocessing be a good solution for this operation?

Question

while True:

    Number = len(SomeList)

    OtherList = array([None]*Number)

    for i in xrange(Number):
        OtherList[i] = (Numpy Array Calculation only using i_th element of arrays, Array_1, Array_2, and Array_3.)

'Number' number of elements in OtherList and other arrays can be calculated seperately. However, as the program is time-dependent, we cannot proceed further job until every 'Number' number of elements are processed.

Will multiprocessing be a good solution for this operation? I should to speed up this process maximally. If it is better, please suggest the code please.

Do you have working code that is too slow? Have you profiled it? — Steven Rumbalski, Oct 28 '11 at 19:13
`len(Number)` will fail because you cannot take the length of a number. `array([None]*Number)` seems a slow way of creating an array because it creates an equal size Python list first. Perhaps you should create your array with `empty` or `zeros`. — Steven Rumbalski, Oct 28 '11 at 19:16
Multi-processing would be a good solution if the time to perform a single calculation exceeds the time needed to spawn subprocesses, distribute data to them, and collect the results. And it assumes that multiple processors are available to operate in parallel - there's no point in running multiple processes on a single processor. Unless the calculation time significantly exceeds the per-process overhead, multiprocessing won't help much. — Dave, Oct 28 '11 at 19:47
There's always converting to a string first, THEN taking the length. `len(str(Number))` — John Doe, Oct 28 '11 at 20:17
Another thing here is that many numpy operations drop the python Global Interpreter Lock (GIL) so that you can use multithreading instead of multiprocessing for many things in numpy. I find this faster but threadsafe can be an issue. — Brian Larsen, Oct 28 '11 at 20:30

score 4 · Accepted Answer · edited May 23 '17 at 11:55

It is possible to use numpy arrays with multiprocessing but you shouldn't do it yet.

Read A beginners guide to using Python for performance computing and its Cython version: Speeding up Python (NumPy, Cython, and Weave).

Without knowing what are specific calculations or sizes of the arrays here're generic guidelines in no particular order:

measure performance of your code. Find hot-spots. Your code might load input data longer than all calculations. Set your goal, define what trade-offs are acceptable
check with automated tests that you get expected results
check whether you could use optimized libraries to solve your problem
make sure algorithm has adequate time complexity. O(n) algorithm in pure Python can be faster than O(n**2) algorithm in C for large n
use slicing and vectorized (automatic looping) calculations that replace the explicit loops in the Python-only solution.
rewrite places that need optimization using weave, f2py, cython or similar. Provide type information. Explore compiler options. Decide whether the speedup worth it to keep C extensions.
minimize allocation and data copying. Make it cache friendly.
explore whether multiple threads might be useful in your case e.g., cython.parallel.prange(). Release GIL.
Compare with multiprocessing approach. The link above contains an example how to compute different slices of an array in parallel.
Iterate

Good advice! Actually, you can use numpy arrays quite nicely with multiprocessing. I do it all the time. You just have to be aware that the easiest way of doing it isn't using shared memory. `multiprocessing` will let you pickle them and send them back similar to any other python object. Obviously, this has a high overhead, and will copy each "chunk", so the trick is (counterintuitively) to work in relatively large chunks. It depends on the problem, but you can get very nice speedups in certain cases with a fairly straight-forward `multiprocessing.pool` + `numpy` approach in some cases. — Joe Kington, Oct 28 '11 at 21:32
All that having been said, I agree, `multiprocessing` is probably not a good fit for the OP's problem, and they're better off optimizing in other ways. — Joe Kington, Oct 28 '11 at 21:40

score 0 · Answer 2 · answered Oct 28 '11 at 21:05

Since you have a while True clause there I will assume you will run a lot if iterations so the potential gains will eventually outweigh the slowdown from the spawning of the multiprocessing pool. I will also assume you have more than one logical core on your machine for obvious reasons. Then the question becomes if the cost of serializing the inputs and de-serializing the result is offset by the gains.

Best way to know if there is anything to be gained, in my experience, is to try it out. I would suggest that:

You pass on any constant inputs at start time. Thus, if any of Array_1, Array_2, and Array_3 never changes, pass it on as the args when calling Process(). This way you reduce the amount of data that needs to be picked and passed on via IPC (which is what multiprocessing does)
You use a work queue and add to it tasks as soon as they are available. This way, you can make sure there is always more work waiting when a process is done with a task.

Will multiprocessing be a good solution for this operation?

2 Answers2