Why npy(numpy array format) is faster to retrieve and use smaller disk size than pickle?

Asked Jul 23 '19 at 08:15

Active Jul 23 '19 at 08:15

Viewed 120 times

I usually use pickle for saving and managing data. However, I read the question and answers in stackoverflow (best way to preserve numpy arrays on disk), I realized that there are huge difference to computation and save it between pickle and npy.

Then, I have been tried to find the reason why, but I can't find it yet.

Are there anyone who know the thing make the difference?

asked Jul 23 '19 at 08:15

frhyme

Can you show some example code an some experiments that show the difference? – Nils Werner Jul 23 '19 at 08:20
Without knowing specifics (which you can look up), pickle serializes way more stuff. It serializes full objects with their methods and everything, and I suppose each value is stored less efficiently, even for simple types (one extra byte for the type or something like that). In NumPy [npy format](http://numpy.org/devdocs/reference/generated/numpy.lib.format.html), you basically have the shape of the array and then a long sequence of values one after another without additional overhead. Being a more goal-specific format, it can simplify things a lot. – jdehesa Jul 23 '19 at 08:49

Why npy(numpy array format) is faster to retrieve and use smaller disk size than pickle?

0 Answers0