1

Hi I need to speed up this code

import numpy as np
matrix3d=np.empty([10,10,1000])
matrix3d[:]=np.random.randint(10)
matrix3d_1=np.empty([10,10,1000])
x=10
y=1
for z in range(0,1000):
    matrix3d_1[:,:,z]=func(matrix3d[:,:,z],x,y)


def func(matrix,x,y):
    return matrix*x+y

I have tried using multiprocessig using Pool.map() but it did not work.

from functools import partial
import multiprocessing as mp

pool=mp.Pool(processes=2)
args=partial(func,x,y)
matrix3d_2=np.empty([10,10,1000])
matrix3d_2=pool.map(args,matrix3d)
pool.close()

If I compare the two matrix matrix3d_1==matrix3d_2 the results is false.

How can this be fixed?

Lorenzo Bottaccioli
  • 441
  • 1
  • 7
  • 20
  • 2
    Is `matrix*x+y` the actual implementation for `func`? – Divakar Jul 27 '17 at 13:30
  • No the `func` is just an example. In fact I'm using this [function]( https://stackoverflow.com/questions/26912112/resample-2d-numpy-array-to-arbitrary-dimensions) in the answer. But I belive there is no big difference. – Lorenzo Bottaccioli Jul 27 '17 at 14:01
  • The code you wan't to use contains nested for loops (4). These are extremely slow in pure python. By just using a jit compiler you can get over 100x speed up here. Depending on your platform (Python 2.7 and Unix or Python 3) in general parallelization is also very easy to amccomplish. This is much more than you can get from parallelization of pure python code with multiprocessing. Take a look at this for example https://stackoverflow.com/a/45337873/4045774 . Is your main concern really to get python multiprocessing to work? – max9111 Jul 27 '17 at 17:31
  • @max9111 My point was not to parallelize the function in it's inside. I wanted to paralallelize the use of it. Becasue the function takes a 2D array and I have a 3D array so I need to make a cycle in the z axis of the matrix. since the depth (z) is very long I need to speed it up, but my concern is to have it in the same order of the starting matrix. So I need to parallelize just one loop, the one in z axis. – Lorenzo Bottaccioli Jul 27 '17 at 18:34
  • But your concern is performance in computation the whole task and not a working example on "How to paralleze a function" in general right? I can show you how to simply parallize that loop, but to this will give only a speed up of something of up to 1.99 on two cores (Depending on the overhead of array copying and pickling). If that is enough, it isn't a problem at all. But you can get a lot more here! – max9111 Jul 27 '17 at 18:45
  • @max9111 my concern now is to parallelize the loop that I have posted in the question not the function that i have linked. I can map it up to 32 cores, but the problem that I have is that I dont get the same matixes even with the dummy function that I have posted. Can you help me on that? – Lorenzo Bottaccioli Jul 27 '17 at 19:34
  • I am working on it. – max9111 Jul 27 '17 at 19:35

1 Answers1

2

Parallel processing of a 3d matrix

The python map method as well as the pool.map methode can only take one input object. See for example https://stackoverflow.com/a/10973817/4045774

To reduce the inputs to one input we can use for example functools. The input which remains have to be on the last place.

from functools import partial
import numpy as np
import multiprocessing as mp

def main():
    matrix3d=np.empty([10,10,1000])
    matrix3d[:]=np.random.randint(10)
    matrix3d_1=np.empty([10,10,1000])
    x=10
    y=1

    pool=mp.Pool(processes=4)
    func_p=partial(func,x,y)

    #parallel map returns a list
    res=pool.map(func_p,(matrix3d[:,:,z] for z in xrange(0,matrix3d.shape[2])))

    #copy the data to array
    for i in xrange(0,matrix3d.shape[2]):
        matrix3d_1[:,:,i]=res[i]

def func(x,y,matrix):
    return matrix*x+y

Parallel version using numba

This version will scale well over all cores and is at least 200 times faster than simple multiprocessing shown above. I have modified the code you linked to a bit, to get rid of any other dependencies than numpy.

import numpy 
from numba import njit, prange

nb_meanInterp = njit("float32[:,:](float32[:,:],int64,int64)")(meanInterp)
resample_3d_nb = njit("float32[:,:,:](float32[:,:,:],int64,int64)",parallel=True)(resample_3d)

def resample_3d(matrix_3d,x,y):
  matrix3d_res=numpy.empty((x,y,matrix_3d.shape[2]),dtype=numpy.float32)
  for z in prange(0,matrix_3d.shape[2]):
    matrix3d_res[:,:,z]=nb_meanInterp(matrix_3d[:,:,z],x,y)

  return matrix3d_res

def meanInterp(data, m, n):
  newData = numpy.zeros((m,n),dtype=numpy.float32)
  mOrig, nOrig = data.shape

  hBoundariesOrig, vBoundariesOrig = numpy.linspace(0,1,mOrig+1), 
numpy.linspace(0,1,nOrig+1)
  hBoundaries, vBoundaries = numpy.linspace(0,1,m+1), numpy.linspace(0,1,n+1)


  for iOrig in range(mOrig):
    for jOrig in range(nOrig):
      for i in range(m):
        if hBoundaries[i+1] <= hBoundariesOrig[iOrig]: continue
        if hBoundaries[i] >= hBoundariesOrig[iOrig+1]: break
        for j in range(n):
          if vBoundaries[j+1] <= vBoundariesOrig[jOrig]: continue
          if vBoundaries[j] >= vBoundariesOrig[jOrig+1]: break

          #boxCoords = ((hBoundaries[i], vBoundaries[j]),(hBoundaries[i+1], vBoundaries[j+1]))
          #origBoxCoords = ((hBoundariesOrig[iOrig], vBoundariesOrig[jOrig]),(hBoundariesOrig[iOrig+1], vBoundariesOrig[jOrig+1]))
          #area=overlap(boxCoords, origBoxCoords)

          #hopefully this is equivivalent (not tested)-----
          T_x=(hBoundaries[i],hBoundaries[i+1],hBoundariesOrig[iOrig],hBoundariesOrig[iOrig+1])
          T_y=(vBoundaries[j],vBoundaries[j+1],vBoundariesOrig[jOrig],vBoundariesOrig[jOrig+1])

          tx=(T_x[1]-T_x[0]+T_x[3]-T_x[2])-(max(T_x)-min(T_x))
          ty=(T_y[1]-T_y[0]+T_y[3]-T_y[2])-(max(T_y)-min(T_y))

          area=tx*ty
          #------------------------
          newData[i][j] += area * data[iOrig][jOrig] / (hBoundaries[1] * vBoundaries[1])

 return newData
max9111
  • 6,272
  • 1
  • 16
  • 33