This is a follow up to this answer to my previous question Fastest approach to read thousands of images into one big numpy array.
In chapter 2.3 "Memory allocation of the ndarray", Travis Oliphant writes the following regarding how indexes are accessed in memory for C-ordered numpy arrays.
...to move through computer memory sequentially, the last index is incremented first, followed by the second-to-last index and so forth.
This can be confirmed by benchmarking the accessing time of 2-D arrays either along the two first or the two last indexes (for my purposes, this is a simulation of loading 500 images of size 512x512 pixels):
import numpy as np
N = 512
n = 500
a = np.random.randint(0,255,(N,N))
def last_and_second_last():
'''Store along the two last indexes'''
imgs = np.empty((n,N,N), dtype='uint16')
for num in range(n):
imgs[num,:,:] = a
return imgs
def second_and_third_last():
'''Store along the two first indexes'''
imgs = np.empty((N,N,n), dtype='uint16')
for num in range(n):
imgs[:,:,num] = a
return imgs
Benchmark
In [2]: %timeit last_and_second_last()
136 ms ± 2.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [3]: %timeit second_and_third_last()
1.56 s ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So far so good. However, when I load arrays along the last and third last dimension, this is almost as fast as loading them into the two last dimensions.
def last_and_third_last():
'''Store along the last and first indexes'''
imgs = np.empty((N,n,N), dtype='uint16')
for num in range(n):
imgs[:,num,:] = a
return imgs
Benchmark
In [4]: %timeit last_and_third_last()
149 ms ± 227 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
- Why is it that
last_and_third_last()
is so my closer in speed tolast_and_second_last()
compared tosecond_and third_last()
? - What's a good way to visualize why the last index matters much more than the second last index in regards to the accessing speed?