5

I have examined Why does pickle take so much longer than np.save? before posting question. From the answers there, we could think that numpy should work faster with ndarrays.
But look at these experiments!

Functions we test:

import numpy as np
import pickle as pkl

a = np.random.randn(1000,5)

with open("test.npy", "wb") as f:
    np.save(f, a)

with open("test.pkl", "wb") as f:
    pkl.dump(a,f)    

def load_with_numpy(name):
    for i in range(1000):
        with open(name, "rb") as f:
            np.load(f)
def load_with_pickle(name):
    for i in range(1000):
        with open(name, "rb") as f:
            pkl.load(f)

Experiment results:

%timeit load_with_numpy("test.npy")
296 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit load_with_pickle("test.pkl")
28.2 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Why is that so?

Ladenkov Vladislav
  • 1,247
  • 2
  • 21
  • 45
  • 2
    We can't reproduce any of this without access to the files you're using, and you haven't shown the file creation. – user2357112 Feb 18 '19 at 22:15
  • 2
    Also, even what you *have* shown us isn't in a state where it could actually work. For example, `load_with_pickle` attempts to make use of an unqualified `load` function that was never imported or defined. – user2357112 Feb 18 '19 at 22:17
  • @user2357112 fixed – Ladenkov Vladislav Feb 18 '19 at 22:21
  • Increase the size of `a`, once it has 100,000 elements or so I'm seeing better performance from numpy. – user2699 Feb 18 '19 at 22:30
  • @user2699 That's true, now i see. In the topic, i linked, it was said that pickle uses numpy to save and load numpy arrays, but seems like this is not true – Ladenkov Vladislav Feb 18 '19 at 22:33
  • The 2 files are close to the same size. `np.load` can load both. We can see from `np.load` code that it checks the first few bytes for a `MAGIC_PREFIX`. Failing to find that it actually calls `pickle.load`. `pickle.load` is compiled. For a `npy` file, `load` uses `np.lib.npyio.format.read_array` which you can read yourself. – hpaulj Feb 19 '19 at 00:01
  • My answer in your link gets faster pickle load times. https://stackoverflow.com/a/51845434/901925 – hpaulj Feb 19 '19 at 01:33

0 Answers0