I have large data set.
The best I could achieve is use numpy arrays and make a binary file out of it and then compressing it:
my_array = np.array([1.0, 2.0, 3.0, 4.0])
my_array.tobytes()
my_array = zlib.compress(my_array)
With my real data, however, the binary file becomes 22mb size so I am hoping to make it even smaller. I found that by default 64bit machines use the float64 which takes up 24 bytes in memory, 8 bytes for pointer to the value, 8 bytes for the double precision and 8 bytes for the garbage collector. If I change it to float32 I gain a lot in memory but lose in precision, I am not sure if I want that, but what about the 8 bytes for garbage collector, is it automatically stripped away?
Observations: I have already tried pickle, hickle, msgpack but 22mb is the best size I managed to reach.