1

Let's say I have an images array that holds 100,000 images with 3 channels.

images = np.random.randint(0,255,(100000,32,32,3))

And I have a function foo which accepts an image and performs some operation on it.

def foo(img):
    #some operation on the image, say histogram equalization

How do I now apply the foo function to 100000 images in parallel? I thought numpy would have some function for this purpose, but I was disappointed to not find any. I found numpy.apply_along_axis but I read it is rather iterative. What should I do?

kmario23
  • 57,311
  • 13
  • 161
  • 150
happy_sisyphus
  • 1,693
  • 1
  • 18
  • 27
  • https://stackoverflow.com/a/2562799/612192 Does this help? – Evan Jan 22 '18 at 02:36
  • 1
    Look up *joblib* which is heavily used by sklearn and make sure you understand the basics of parallel programming. Depending on the operations, some parts might already be parallel (some numpy core functions are; but not sure if resize and co is too). – sascha Jan 22 '18 at 02:36
  • Seems like you're preprocessing the images before feeding into some CNNs. As @sascha mentioned, `joblib` should do the job for you. I have worked on such a project before. Go for it – kmario23 Jan 22 '18 at 03:23

1 Answers1

2

Here is an example, using joblib which performs histogram equalization on the images, in parallel with n_jobs equal to nprocs (here 10 processes but you can change as per your need)

# imports
import numpy as np
from skimage import exposure
from joblib import Parallel, delayed

# number of processes
nprocs = 10

# batched image array
img_arr = np.random.randint(0, 255, (1000, 32, 32, 3))

# function to be applied on all images
def process_image(img):
     img_eq = exposure.equalize_hist(img)
     return img_eq

result = []

# run `process_image()` in parallel
result.extend(Parallel(n_jobs=nprocs)(delayed(process_image)(img_arr[idx]) for idx in range(img_arr.shape[0])))
kmario23
  • 57,311
  • 13
  • 161
  • 150
  • Does the worker share memory space with the main thread/process? If so, you can use `img[...] = equalize(img)` in the worker, which will update each image in place – Eric Jan 22 '18 at 06:45
  • @Eric thanks a lot for your suggestion! Yes it does share the memory, but there are couple of issues in updating the array in-place. Since histogram equalization shifts the value of pixels to 0-1 range, updating it in the original array makes all elments to zero since original array was integer dtype. So, we'd have to create the orginal array of type float32 and then in-place updation can be done without any issues – kmario23 Jan 22 '18 at 16:24
  • @Eric Also, please see this question: https://stackoverflow.com/questions/48387466/updating-batch-image-array-in-place-when-using-joblib – kmario23 Jan 23 '18 at 02:18