Parallel processing with numpy or numba

Question

I have a simple problem. A function receives an array [a, b] of two numbers, and it returns another array [aa, ab]. The sample code is

import numpy as np

def func(array_1):
    array_2 = np.zeros_like(array_1)
  
    array_2[0] = array_1[0]*array_1[0]
    array_2[1] = array_1[0]*array_1[1]
    
    return array_2

array_1 = np.array([3., 4.])                 # sample test array [a, b]

print(array_1)                               # prints this test array [a, b]           
print(func(  array_1  ) )                    # prints [a*a, a*b]

The two lines inside the function func

    array_2[0] = array_1[0]*array_1[0]
    array_2[1] = array_1[0]*array_1[1]

are independent and I want to parallelize them.

Please tell me

how to parallize this (without Numba)?
how to parallize this (with Numba)?

Please give some idea of the sizes/shapes/dtypes of the arrays involved in your real code. — Mark Setchell, Feb 09 '23 at 00:05
As the code exhibits, I am working with a row vector of length 2 which contains floating point numbers. — Sashwat Tanay, Feb 09 '23 at 00:09

score 2 · Answer 1 · answered Feb 09 '23 at 01:38

This does not make sense to parallelise this code using multiple threads/processes because the arrays are far too small for such parallelisation to be useful. Indeed, creating thread typically takes about 1-100 microseconds on a mainstream machine while this code should take clearly less than a microsecond in Numba. In fact, the two computing lines should take less than 0.01 microsecond. Thus, creating thread will make the execution far slower.

Assuming your array would be much bigger, the typical way to parallelize a Python script is to use multiprocessing (which creates processes). For a Numba code, it is prange + parallel=True (which creates threads).

If you execute a jitted Numba function, then the code already runs a bit in parallel. Indeed, modern mainstream processors already execute instructions in parallel. This is called instruction-level parallelism. More specifically, modern processors pipeline the instructions and execute multiple of them thanks to a superscalar execution and in an out-of-order way. All of this is completely automatic. You just need to avoid having dependencies between the executed instructions.

Finally, if you want to speed up this function, then you need to use Numba in the caller function because a function call from CPython is far more expensive than computing and storing two floats. Note also that allocating an array is pretty expensive too so it is better to reuse buffers at this granularity.

Yes, I intend to use this function with much larger arrays. Can you please give me a functional code? — Sashwat Tanay, Feb 09 '23 at 02:57
For `prange`, [the documentation gives an example](https://numba.pydata.org/numba-doc/latest/user/parallel.html#explicit-parallel-loops). You can check the many existing questions about prange on this website. The same is true for multiprocessing, but there are multiple ways to share Numpy arrays. Sharing array efficiently tends not to be so simple. Consider reading this [post](https://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing) for more information about this (or other similar one). — Jérôme Richard, Feb 09 '23 at 18:58

Parallel processing with numpy or numba

1 Answers1