I have a 3D array that only contains values of 0, 1 and 2 and want to translate those values to 0,128 and 255 respectively. I have looked around and this thread ( Translate every element in numpy array according to key ) seems like the way to go.
So I tried implementing it and it worked, the relevant part of the code can be seen below (I read and write data from and to h5 files but I doubt that's important, I just mention it in case it is)
#fetch dataset from disk
f = h5py.File('input/A.h5','r') #size = 572kB
#read and transform array
array = f['data'].value #type = numpy.ndarray
my_dict = {1:128, 2:255, 0:0}
array=np.vectorize(my_dict.get)(array)
#write translated dataset to disk
h5 = h5py.File('output/B.h5', driver=None) #final size = 4.5MB
h5.create_dataset('data', data=array)
h5.close()
Problem is, the input file (A.h5) is of size 572kB, the output file (B.h5) is 8 times as large (4.5MB).
What is going on here ? I have another array with the same dimensions full of values from 0 to 255 and it also is of size 572kB, so the numbers being larger shouldn't matter. My first guess was that maybe python was creating objects instead of ints, I tried casting to int but the size stays the same.
side note : if I transform data with 3 indented for loops then the size stays 572kB (but the code is much slower)