0

I m using griddata to "mount" array with a great number of shapes and i would like to know if i can calculate functions (on each slice) on each my 4 cores in order to accelerate the process?

import numpy

size = 8.
Y=(arange(2000))
X=(arange(2000))
(xx,yy)=meshgrid(X,Y)

array=zeros((Y.shape[0],X.shape[0],size))

array[:,:,0] = 0
array[:,:,1] = X+Y
array[:,:,2] = X**2+Y**2+X+Y
array[:,:,3] = X**3+Y**3+X**2+Y**2+X+Y

array[:,:,4] = X**4+Y**4+X**3+Y**3+X**2+Y**2+X+Y
array[:,:,5] = X**5+Y**5+X**4+Y**4+X**3+Y**3+X**2+Y**2+X+Y
array[:,:,6] = X**6+Y**6+X**5+Y**5+X**4+Y**4+X**3+Y**3+X**2+Y**2+X+Y
array[:,:,6] = X**7+Y**7+X**6+Y**6+X**5+Y**5+X**4+Y**4+X**3+Y**3+X**2+Y**2+X+Y

So here i would like to calculate array[:,:,0] & array[:,:,1] with the first core, then array[:,:,2] & array[:,:,3] with the second core...?

----EDIT LATER---

There is no link between different "slices"...My different functions are independent

array[:,:,0] = 0
array[:,:,1] = X+Y
array[:,:,2] = X*np.cos(X)+Y*np.sin(Y)
array[:,:,3] = X**3+np.sin(X)+X**2+Y**2+np.sin(Y)
...
user3601754
  • 3,792
  • 11
  • 43
  • 77

3 Answers3

1

You can try with multiprocessing.Pool :

from multiprocessing import Pool
import numpy as np

size = 8.
Y=(np.arange(2000))
X=(np.arange(2000))
(xx,yy)=np.meshgrid(X,Y)

array=np.zeros((Y.shape[0],X.shape[0],size))

def func(i): # you need to call a function with Pool
    array_=np.zeros((Y.shape[0],X.shape[0]))
    for j in range(1,i):
        array_+=X**j+Y**j
    return array_

if __name__ == '__main__':
    p = Pool(4) # if you have 4 cores in your processor
    result=p.map(func, range(1,8))
    for i in range(1,8):
        array[::,::,i]=result[i-1]

Keep in mind that multiprocessing in python does not share memory, that's why you have to create the array_ and add the for-loop at the end of the code. As your application (with these dimensions) doesn't need a lot of computing time, it is possible that you will be slower with this method. Also you will create multiple copies of all your variables, wich may cause a memory overflow. You should also double-check the func I wrote, as I didn't completely verify that it does what it is supposed to do :)

CoMartel
  • 3,521
  • 4
  • 25
  • 48
  • Sorry in my case, there isn t link between my functions...so i can t write : "for j in range(1,i): array_+=X**j+Y**j" – user3601754 Apr 21 '15 at 10:27
  • 1
    Then you can use the `multiprocessing.Process` for each function : it will start an independant (still no shared memory) process for each dimension of your array, and you will have to re-build your array (as I did) by collecting the various retuns with a Queue for example. If you use the same functions a lot for different Array, it could be worth the work. If you think it could work, I can help you with the code for the multiprocessing part. – CoMartel Apr 21 '15 at 11:03
  • Thanks for your help, but when i calculate the time with and without multiprocess...i find that without multiprocess i m faster :s – – user3601754 Apr 22 '15 at 11:34
  • 2
    in your case yes it's highly probable. Multiprocessing in Python is very usefull for long task that should run simultaneously and repetitive task, not for small tasks. The only way now to improve your execution time is by simplifying the math and grouping your calculations, as @Mr E said. – CoMartel Apr 22 '15 at 12:25
1

If you want to apply a single function over an array of data, then using e.g. a multiprocessing.Pool is a good solution, provided that both the input and output of the calculation are relatively small.

You want to do many different calculations to two input arrays, which results in an array being returned for every one of those calculations.

Since separate processes do not share memory, the X and Y arrays have to be transported to each worker process when it is are started. And the result of each calculation (which is also a numpy array the same size as X and Y) has to be returned to the parent process.

Depending on e.g. the size of the arrays and the amount of cores, the overhead from the transfer of all those array between worker processes and the parent process via interprocess communication ("IPC") will cost time, reducing the advantages of using multiple cores.

Keep in mind that the parent process has to listen for and handle IPC requests from all the worker processes. So you've shifted the bottleneck from calculation to communication.

So it is not a given that multiprocessing will actually improve performance in this case. It depends on the details of the actual problem (number of cores, array size, amount of physical memory et cetera).

You will have to do some careful performance measurements using e.g. Pool or Process with realistic array sizes.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
0

Three things:

  1. The most important question is why are you doing this?.
  2. Your NumPy build may already be making use of multiple cores. I am not sure off the top of my head how to check, see questions like this or if absolutely necessary take a look at the Numexpr library https://github.com/pydata/numexpr
  3. About the "Y" in your likely XY problem - you are re-calculating data that you can instead re-use:

.

import numpy

size = 8
Y=(arange(2000))
X=(arange(2000))
(xx,yy)=meshgrid(X,Y)

array = zeros((Y.shape[0], X.shape[0], size))

array[..., 0] = 0    
for i in range(1, size):
    array[..., 1] = X ** i + Y ** i + array[..., i - 1]
Community
  • 1
  • 1
YXD
  • 31,741
  • 15
  • 75
  • 115
  • For the third point it s just an example, i cant re-use the calculation. – user3601754 Apr 21 '15 at 10:19
  • 1
    In that case please ask about your actual problem – YXD Apr 21 '15 at 10:20
  • I mean there is no link between my different functions. – user3601754 Apr 21 '15 at 10:28
  • 2
    You should check the 2nd point though. It is True that Numpy is massively using multiprocessing. Try calculating what you need by using Numpy functions. In ubuntu, you can see your processors activity with htop, in windows with the task manager. – CoMartel Apr 21 '15 at 10:53
  • @MrE Depending on the BLAS library used, things like `numpy.dot` might be parallelized. If `import numpy.core._dotblas` succeeds, then you have a fast BLAS implementation. – Roland Smith May 14 '15 at 22:59