I was surprised to find that if you save the same numpy object to file using numpy.savez
, the file created is not deterministic.
For example,
import numpy
x = numpy.random.rand(1000, 1000)
numpy.savez('foo.npz', x)
numpy.savez('bar.npz', x)
And then
md5sum foo.npz bar.npz
d1b8b7d2000055b8bf62dddc4a5c77b5 foo.npz
1c6e13bb9efca3ec144e81b88b6cdc75 bar.npz
Reading this it looks like it has something to do with the time stamp in the npz zip file.
For testing purposes, I want to verify that the data files that my code creates are identical. I usually do this with a checksum on pickle files, e.g.
import cPickle as pickle
with open('foo.pkl', 'wb') as f:
pickle.dump(x, f, protocol=2)
with open('bar.pkl', 'wb') as f:
pickle.dump(x, f, protocol=2)
And then
md5sum foo.pkl bar.pkl
3139d9142d57bdde0970013f39b4854f foo.pkl
3139d9142d57bdde0970013f39b4854f bar.pkl
Is there any workaround for doing the same thing with numpy.savez
?