I have many 100x100px black/white GIF images. I want to use them in Numpy to train a machine learning algorithm, but I would like to save them in a single file that is easily readable in Python/Numpy. By saying many I mean several hundred thousands, so I would like to take advantage of the images carrying only 1 bit per pixel.
Any idea on how I can do this?
EDIT:
I used a BitArray object, from the bitstring module. Then I saved it using numpy.savez. The problem is that it takes ages to save. I never managed to see the end of such process on the entire dataset. I tried to save a small subset and it took 10 minutes and about 20 times the size of the small subset itself.
I will try with the BoolArray, thanks for the reference.
EDIT (solved):
I solved the problem by using a different approach from those that I found in the questions you linked. I found the numpy.packbits function here: numpy boolean array with 1 bit entries
I'm reporting my code here so it can be useful to others:
accepted_shape = (100, 100)
images = []
for file_path in gifs:
img_data = imread(file_path)
if img_data.shape != accepted_shape:
continue
max_value = img_data.max()
min_value = img_data.min()
middle_value = (max_value - min_value) // 2
image = np.packbits((img_data.ravel() > middle_value).astype(int))
images.append(image)
np.vstack(images)
np.savez_compressed('dataset.npz', shape=accepted_shape, images=images)
This just requires some attention when uncompressing because if the number of bits is not a multiple of 8, some zeros are added as padding. This is how I uncompress the files:
data = np.load('dataset.npz')
shape = data['shape']
images = data['images']
nf = np.prod(shape)
ne = images.size / nf
images = np.unpackbits(images, axis=1)
images = images[:,:nf]