Using the maxshape
parameter allows you to modify the dataset size. Note, maxshape
needs to match of dimensions of your image dataset. You entered 1 dimension, but need 3 for all image data (1000, 2048, 2048). Also the initial dataset size in your code is set from the size of the data=img
array size. It will have shape (2048,2048). The dataset needs a third dimension for all image data.
There are 3 approaches to load all your image data:
1. Set shape=(nfiles,a1,a2)
to initially size for all images. No need to resize unless you want add more images later.
2. Initially set shape=(1,a1,a2)
(for 1 image), then use .resize()
to increase the size as you add images. This method is not very efficient as your datasets grow.
3. Initially set shape=(N,a1,a2)
(for N images), then use .resize()
to increase the size by N when the dataset is full. (N can be any number. I used 10 in the example below, but you might use 100 or 1000 for a real world application).
All 3 methods are in the example below for 30 images w/ a smaller image size. I create random integer data for the images. Replace np.random.randint()
with np.array(Image.open(files[i]))
for your files.
The examples demonstrates the process. Note that Methods 1 and 2 will only work when you create the HDF5 file and populate the imaged data (because the dataset index is the same as the image counter). Method 3 shows how to add data incrementally. It uses an attribute that counts the number of images loaded. The counter sets the position to add the new image. It is also used to check current dataset size (and resize as needed).
In production code you need additional checks that image size and shape match dataset size and shape.
import h5py
import numpy as np
nfiles=30
a0 = nfiles # for number of images
a1= 256 ; a2 = 256 # for image size
with h5py.File('input_images1.h5', 'w') as f:
for i in range(nfiles):
img_arr = np.random.randint(0,254, (a1, a2), int)
if i == 0:
img_ds = f.create_dataset('/array', shape=(a0,a1,a2),
maxshape = (None,a1,a2), chunks = True)
f['/array'][i,:,:]=img_arr
print(i)
with h5py.File('input_images2.h5', 'w') as f:
for i in range(nfiles):
img_arr = np.random.randint(0,254, (a1, a2), int)
if i == 0:
img_ds = f.create_dataset('/array', shape=(1,a1,a2),
maxshape = (None,a1,a2), chunks = True)
else:
f['/array'].resize(i+1,axis=0)
f['/array'][i,:,:]=img_arr
print(i)
with h5py.File('input_images3.h5', 'a') as f:
for i in range(nfiles):
img_arr = np.random.randint(0,254, (a1, a2), int)
if 'array' not in f.keys() :
img_ds = f.create_dataset('/array', shape=(10,a1,a2),
maxshape = (None,a1,a2), chunks = True)
img_ds.attrs['n_images'] = 0
else:
img_ds = f['/array']
n_images = img_ds.attrs['n_images']
if n_images == img_ds.shape[0] :
print ('adding 10 rows to /array')
img_ds .resize(img_ds.shape[0]+10,axis=0)
img_ds[n_images,:,:]=img_arr
img_ds.attrs['n_images'] = n_images+1
print(img_ds.attrs['n_images'])