1

I have an application where I'll be repeating the following set of operations many times:

Operations:
-> Read N images (all have the same dimension (H,W))
-> Normalize each image to (0,1)
-> Store these N images in a single numpy array
-> Return the array (of shape (N, H, W))

Translating this into code, it would be something like:

def load_block(im_paths, H, W):
    N = len(im_paths)
    im_block = np.empty((N, H, W), dtype=np.float32) 

    for i, im_path in enumerate(im_paths):
        image = cv2.imread(im_path, 0)
        im_block[i, :, :] = (image-image.min())/(image.max()-image.min())
    return im_block

So I want to speed up this process. My initial go to would be numba, however I'm not sure if it'll be of use here since I'm doing I/O ops.

Mercury
  • 3,417
  • 1
  • 10
  • 35
  • 1
    Seems relevant - [Fastest approach to read thousands of images into one big numpy array](https://stackoverflow.com/questions/44078327/fastest-approach-to-read-thousands-of-images-into-one-big-numpy-array). – Divakar Oct 27 '20 at 18:23
  • How big is `N`? Do you really need them all in one Numpy array? What sort of CPU and disk susbsystem do you have? – Mark Setchell Oct 27 '20 at 18:45
  • @Divakar seems like the accepted answer is doing exactly what I'm doing thus far. Still thanks though, it is relevant indeed. – Mercury Oct 27 '20 at 21:30
  • @MarkSetchell N is variable, with an upper bound at 800, and I really do need them in a single array. I'm thinking of a system independent, general solution; may have to dig into multiprocessing and shared arrays. – Mercury Oct 27 '20 at 21:30

1 Answers1

1

I don't think numba can help with the image loading. It possibly could help with the normalization. Perhaps the following might gain you something. It's certainly worth a try.

images = [cv2.imread(im_path) for im_path in im_paths]

@njit
def load_block(images, H, W):
     ... loop over images rather than image paths ...
Frank Yellin
  • 9,127
  • 1
  • 12
  • 22