0

I have a large numpy array of dimension (8,512,512,3) which is updated in a for loop running 8 times by calling a function on some images. Can I use multiprocessing / concurrent features to fill the numpy array in lesser time?

def myfun(arr):
    #some computation
    return out #Dimension of out is (512,512,50,3)

X = np.empty((8,512,512,50,3))
inp = np.ones((8,1000))

for i in range(8):
    X[i] = myfun(inp[i])

Update - I tried to use multiprocessing this way but it was slower than sequential

def myfun_mp(inp,return_list):
    return_list.append(myfun(inp))

manager = multiprocessing.Manager()
return_list = manager.list()
jobs = []

for i in range(8):
    p = multiprocessing.Process(target = myfun_mp, args = (inp[i],return_list))
    jobs.append(p)
    p.start()

for p in jobs:
    p.join()

    X = np.array(return_list) #This takes time
Srikar Ym
  • 17
  • 3
  • depends on the nature of the computation in `myfun(arr)`. If the computations are independent of each other then there should be no problem, i.e. if you split your computations into 8 blocks then the data conatained in each block should be sufficient to fill your array – DrBwts Jul 17 '19 at 17:58
  • Cost exchanging between processes is high so I discourage it. Most of numpy function release global interpreter lock when entering c functions.it allows to take advantage of multithreading. – tstanisl Jul 17 '19 at 18:00
  • @DrBwts the computation is independent of each other. I was having difficulty because the changes made to X in the child process will not be reflected in the parent process. – Srikar Ym Jul 17 '19 at 19:24
  • @SrikarYm can you add the multithreaded code you have tried? – DrBwts Jul 18 '19 at 12:17
  • @DrBwts Updated the post with the multiprocessing code I tried. – Srikar Ym Jul 18 '19 at 13:43

0 Answers0