I'm creating datset of windowed data for deep learning. I generated the data as numpy arrays 4 arrays with shape (141038, 360) and 1 array for labels of shape (141038, ). I saved the arrays in npz file but the file size is too big 1.5 GB. I'm new to python and programming so don't have idea how big should the file size be. However I converted the arrays to Pandas dataframes and Memory usage was in the same range. The Problem that I have 6 files with 9 GB and probably another dataset with overlapping which is 7 times larger so it would be 63 GB possibly.
Is such a file size realistic or have I done something wrong? (it's just a file with some numbers not a game)
Is there another format to save my arrays with less memory usage? (I tried HFD5 but I got the same file size)
I tried to change datatypes and it reduced the size slightly. (3 arrays (f8), 1 (int8), 1 (uint8)) is there other datatypes which could reduce the size more? for 0/1 values is there another datatype more efficient than (uint)?
For float arrays if I reduce the precision, would it help? or there is another way to reduce their size?
I have some files filled with Zero padding ,some with edge padding and others with Interpolation. However all files almost have the same size, shouldn't the files with Zero padding have less size?