I have examined Why does pickle take so much longer than np.save? before posting question.
From the answers there, we could think that numpy
should work faster with ndarrays
.
But look at these experiments!
Functions we test:
import numpy as np
import pickle as pkl
a = np.random.randn(1000,5)
with open("test.npy", "wb") as f:
np.save(f, a)
with open("test.pkl", "wb") as f:
pkl.dump(a,f)
def load_with_numpy(name):
for i in range(1000):
with open(name, "rb") as f:
np.load(f)
def load_with_pickle(name):
for i in range(1000):
with open(name, "rb") as f:
pkl.load(f)
Experiment results:
%timeit load_with_numpy("test.npy")
296 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit load_with_pickle("test.pkl")
28.2 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Why is that so?