Fastest approach to read thousands of images into one big numpy array

Question

I'm trying to find the fastest approach to read a bunch of images from a directory into a numpy array. My end goal is to compute statistics such as the max, min, and nth percentile of the pixels from all these images. This is straightforward and fast when the pixels from all the images are in one big numpy array, since I can use the inbuilt array methods such as .max and .min, and the np.percentile function.

Below are a few example timings with 25 tiff-images (512x512 pixels). These benchmarks are from using %%timit in a jupyter-notebook. The differences are too small to have any practical implications for just 25 images, but I am intending to read thousands of images in the future.

# Imports
import os
import skimage.io as io
import numpy as np

Appending to a list

%%timeit
imgs = []    
img_path = '/path/to/imgs/'
for img in os.listdir(img_path):    
    imgs.append(io.imread(os.path.join(img_path, img)))    
## 32.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using a dictionary

%%timeit    
imgs = {}    
img_path = '/path/to/imgs/'    
for img in os.listdir(img_path):    
    imgs[num] = io.imread(os.path.join(img_path, img))    
## 33.3 ms ± 402 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

For the list and dictionary approaches above, I tried replacing the loop with a the respective comprehension with similar results time-wise. I also tried preallocating the dictionary keys with no significant difference in the time taken. To get the images from a list to a big array, I would use np.concatenate(imgs), which only takes ~1 ms.

Preallocating a numpy array along the first dimension

%%timeit    
imgs = np.ndarray((512*25,512), dtype='uint16')    
img_path = '/path/to/imgs/'    
for num, img in enumerate(os.listdir(img_path)):    
    imgs[num*512:(num+1)*512, :] = io.imread(os.path.join(img_path, img))    
## 33.5 ms ± 804 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Preallocating a numpy along the third dimension

%%timeit    
imgs = np.ndarray((512,512,25), dtype='uint16')    
img_path = '/path/to/imgs/'    
for num, img in enumerate(os.listdir(img_path)):    
    imgs[:, :, num] = io.imread(os.path.join(img_path, img))    
## 71.2 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I initially thought the numpy preallocation approaches would be faster, since there is no dynamic variable expansion in the loop, but this does not seem to be the case. The approach that I find the most intuitive is the last one, where each image occupies a separate dimensions along the third axis of the array, but this is also the slowest. The additional time taken is not due to the preallocation itself, which only takes ~ 1 ms.

I have three question regarding this:

Why is the numpy preallocation approaches not faster than the dictionary and list solutions?
Which is the fastest way to read in thousands of images into one big numpy array?
Could I benefit from looking outside numpy and scikit-image, for an even faster module for reading in images? I tried plt.imread(), but the scikit-image.io module is faster.

Did you try initializing a `(25, 512, 512)` array? The first dimension is outer one. `np.array(imgs)` from the first list approach produces this shape. Most of that 33ms time is the load, not the storage. To test that, trying loading without accumulating the arrays. — hpaulj, May 19 '17 at 20:26
Thanks @hpaulj! Your tips about the outer dimensions and that most of my time spent was from overhead were helpful. I tried with 300 higher resolution tiffs (1024x1024 pixels) and the numpy outer dimension approach (either [300, 1024, 1024] or [1024, 300, 1024]) is now the fastest (~1s). Second is the list and dictionary solutions (~1.7s), and the numpy inner(?) dimension [1024, 1024, 300] is dead last (~4.6s). If you add an answer, I can add these benchmarks and accept it. — joelostblom, May 19 '17 at 21:05
@hpaulj If you could add a link going into a bit more detail regarding what it means that the first (and second?) dimensions are the outer one, I would really appreciate it. I can't find anything relevant in my searches. — joelostblom, May 19 '17 at 21:14
How about skipping the load stage altogether? Option 1: `numpy.memmap / numpy.load(..., mmap_mode="r")` Option 2: `blaze` — Dima Tisnek, May 22 '17 at 19:39

score 14 · Accepted Answer · edited May 22 '17 at 14:38

Part A : Accessing and assigning NumPy arrays

Going by the way elements are stored in row-major order for NumPy arrays, you are doing the right thing when storing those elements along the last axis per iteration. These would occupy contiguous memory locations and as such would be the most efficient for accessing and assigning values into. Thus initializations like np.ndarray((512*25,512), dtype='uint16') or np.ndarray((25,512,512), dtype='uint16') would work the best as also mentioned in the comments.

After compiling those as funcs for testing on timings and feeding in random arrays instead of images -

N = 512
n = 25
a = np.random.randint(0,255,(N,N))

def app1():
    imgs = np.empty((N,N,n), dtype='uint16')
    for i in range(n):
        imgs[:,:,i] = a
        # Storing along the first two axes
    return imgs

def app2():
    imgs = np.empty((N*n,N), dtype='uint16')
    for num in range(n):    
        imgs[num*N:(num+1)*N, :] = a
        # Storing along the last axis
    return imgs

def app3():
    imgs = np.empty((n,N,N), dtype='uint16')
    for num in range(n):    
        imgs[num,:,:] = a
        # Storing along the last two axes
    return imgs

def app4():
    imgs = np.empty((N,n,N), dtype='uint16')
    for num in range(n):    
        imgs[:,num,:] = a
        # Storing along the first and last axes
    return imgs

Timings -

In [45]: %timeit app1()
    ...: %timeit app2()
    ...: %timeit app3()
    ...: %timeit app4()
    ...: 
10 loops, best of 3: 28.2 ms per loop
100 loops, best of 3: 2.04 ms per loop
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 2.36 ms per loop

Those timings confirm the performance theory proposed at the start, though I expected the timings for the last setup to have timings in between the ones for app3 and app1, but maybe the effect of going from last to the first axis for accessing and assigning isn't linear. More investigations on this one could be interesting (follow up question here).

To claify schematically, consider that we are storing image arrays, denoted by x (image 1) and o (image 2), we would have :

App1 :

[[[x 0]
  [x 0]
  [x 0]
  [x 0]
  [x 0]]

 [[x 0]
  [x 0]
  [x 0]
  [x 0]
  [x 0]]

 [[x 0]
  [x 0]
  [x 0]
  [x 0]
  [x 0]]]

Thus, in memory space, it would be : [x,o,x,o,x,o..] following row-major order.

App2 :

[[x x x x x]
 [x x x x x]
 [x x x x x]
 [o o o o o]
 [o o o o o]
 [o o o o o]]

Thus, in memory space, it would be : [x,x,x,x,x,x...o,o,o,o,o..].

App3 :

[[[x x x x x]
  [x x x x x]
  [x x x x x]]

 [[o o o o o]
  [o o o o o]
  [o o o o o]]]

Thus, in memory space, it would be same as previous one.

Part B : Reading image from disk as arrays

Now, the part on reading image, I have seen OpenCV's imread to be much faster.

As a test, I downloaded Mona Lisa's image from wiki page and tested performance on image reading -

import cv2 # OpenCV

In [521]: %timeit io.imread('monalisa.jpg')
100 loops, best of 3: 3.24 ms per loop

In [522]: %timeit cv2.imread('monalisa.jpg')
100 loops, best of 3: 2.54 ms per loop

Thanks for the reply! Could you clarify why it is faster to read `[x,x,x,...o,o,o,...]` than to read `[x,o,x,o,x,o...]`? `x` and `o` would be the same type of object, and integer of the pixel intensity, right? — joelostblom, May 21 '17 at 23:05
Also, could you expand upon "you are doing the right thing when storing those elements along the last axis per iteration"? Don't your examples show that the fastest alternative is to store the elements along the first axis/dimension, i.e. `x` in `np.ndarray([x, y, z])`? Or are "the elements" not referring to the separate images? — joelostblom, May 21 '17 at 23:12
@JoelOstblom To first query - As I have pointed out in the post and repeating again : `These would occupy contiguous memory locations and as such would be the most efficient for accessing and assigning values into. `, because for `[x,x,x,...o,o,o,...]` we are storing image1, i.e. `x's` first and then `image2`, i.e. `o's` next. To the second query - Edited codes with comments. — Divakar, May 22 '17 at 07:35
I see, I didn't realize `x` and `o` were referring to two different images rather than the x, y coordinates of the same image. Your edit together with [chapter 2.3 in Oliphant's "Guide to NumPy"](http://web.mit.edu/dvp/Public/numpybook.pdf), helped me understand, thanks! I initially thought that the separate images should be stored contiguously in memory, but it is really the coordinates of each image, since this is what needs to be accessed in order to fetch an image from memory. I tried your examples with Fortran ordered arrays, and the results line up with what I would expect given... — joelostblom, May 22 '17 at 12:55
...that their fastest varying index is the first rather than the last. However, there is one thing that is still confusing to me. Oliphant writes (for C ordered arrays): "... to move through computer memory sequentially, the last index is incremented first, followed by the second-to-last index and so forth". However, accessing `A = np.empty((512,25,512))` is as fast as `B = np.empty((25,512,512))`. Should it not be slower to access `A` since its information is in the last and third-last index, compared to the last and second-last as in `B`? — joelostblom, May 22 '17 at 12:55
I expected the same as you in regards to the access speed of `app4()`. If you increase `n`, it becomes clear that `app4()` is indeed slower than `app2()` and `app3()`, but only by a tiny amount. I asked a [follow up question regarding this](https://stackoverflow.com/questions/44115571/accessing-the-last-and-third-last-indexes-is-almost-as-fast-as-accessing-the-las) — joelostblom, May 22 '17 at 14:36

Gerges · Answer 2 · 2017-05-19T21:53:24.573

In this case, most of the time will be spent reading the files from disk, and I wouldn't worry too much about the time to populate a list.

In any case, here is a script comparing four method, without the overhead of reading an actual image from disk, but just read an object from memory.

import numpy as np
import time
from functools import wraps


x, y = 512, 512
img = np.random.randn(x, y)
n = 1000


def timethis(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        r = func(*args, **kwargs)
        end = time.perf_counter()
        print('{}.{} : {} milliseconds'.format(func.__module__, func.__name__, (end - start)*1e3))
        return r
    return wrapper


@timethis
def static_list(n):
    imgs = [None]*n
    for i in range(n):
        imgs[i] = img
    return imgs


@timethis
def dynamic_list(n):
    imgs = []
    for i in range(n):
        imgs.append(img)
    return imgs


@timethis
def list_comprehension(n):
    return [img for i in range(n)]


@timethis
def numpy_flat(n):
    imgs = np.ndarray((x*n, y))
    for i in range(n):
        imgs[x*i:(i+1)*x, :] = img

static_list(n)
dynamic_list(n)
list_comprehension(n)
numpy_flat(n)

The results show:

__main__.static_list : 0.07004200006122119 milliseconds
__main__.dynamic_list : 0.10294799994881032 milliseconds
__main__.list_comprehension : 0.05021800006943522 milliseconds
__main__.numpy_flat : 309.80870099983804 milliseconds

Obviously your best bet is list comprehension, however even with populating a numpy array, its just 310 ms for reading 1000 images (from memory). So again, the overhead will be the disk read.

Why numpy is slower?

It is the way numpy stores array in memory. If we modify the python list functions to convert the list to a numpy array, the times are similar.

The modified functions return values:

@timethis
def static_list(n):
    imgs = [None]*n
    for i in range(n):
        imgs[i] = img
    return np.array(imgs)


@timethis
def dynamic_list(n):
    imgs = []
    for i in range(n):
        imgs.append(img)
    return np.array(imgs)


@timethis
def list_comprehension(n):
    return np.array([img for i in range(n)])

and the timing results:

__main__.static_list : 303.32892100022946 milliseconds
__main__.dynamic_list : 301.86925499992867 milliseconds
__main__.list_comprehension : 300.76925699995627 milliseconds
__main__.numpy_flat : 305.9309459999895 milliseconds

So it is just a numpy thing that it takes more time, and it is constant value relative to array size...

score -1 · Answer 3 · answered Nov 01 '21 at 11:23

I think you could try with glob.glob, what should help

image_list = []
with open('train_train_.csv', 'w') as csv_file:
    csv_writer = csv.writer(csv_file, delimiter ='-')

    for filename in glob.glob(r'C:\your path to\file*.png'):

        img = cv2.imread(filename)
        image_list.append(img)
        csv_writer.writerow(img)
        print(img)

Cheers

Fastest approach to read thousands of images into one big numpy array

3 Answers3

Linked