I'm working in python with a large dataset of images, each 3600x1800. There are ~360000 images in total. Each image is added to the stack one by one since an initial processing step is run on each image. H5py has proven effective at building the stack, image by image, without it filling the entire memory.
The analysis I am running is calculated on the grid cells - so on 1x1x360000 slices of the stack. Since the analysis of each slice depends on the max and min values of within that slice, I think it is necessary to hold the 360000-long array in memory. I have a fair bit of RAM to work with (~100GB) but not enough to hold the entire stack of 3600x1800x360000 in memory at once.
This means I need (or I think I need) a time-efficient way of accessing the 360000-long arrays. While h5py is efficient at adding each image to the stack, it seems that slicing perpendicular to the images is much, much slower (hours or more).
Am I missing an obvious method to slice the data perpendicular to the images?
Code below is a timing benchmark for 2 different slice directions:
file = "file/path/to/large/stack.h5"
t0 = time.time()
with h5py.File(file, 'r') as f:
dat = f['Merged_liqprec'][:,:,1]
print('Time = ' + str(time.time()- t0))
t1 = time.time()
with h5py.File(file, 'r') as f:
dat = f['Merged_liqprec'][500,500,:]
print('Time = ' + str(time.time()- t1))
Output:
## time to read a image slice, e.g. [:,:,1]:
Time = 0.0701
## time to read a slice thru the image stack, e.g. [500,500,:]:
Time = multiple hours, server went offline for maintenance while running