19

I'm implementing a program that needs to serialize and deserialize large objects, so I was making some tests with pickle, cPickle and marshal modules to choose the best module. Along the way I found something very interesting:

I'm using dumps and then loads (for each module) on a list of dicts, tuples, ints, float and strings.

This is the output of my benchmark:

DUMPING a list of length 7340032
----------------------------------------------------------------------
pickle => 14.675 seconds
length of pickle serialized string: 31457430

cPickle => 2.619 seconds
length of cPickle serialized string: 31457457

marshal => 0.991 seconds
length of marshal serialized string: 117440540

LOADING a list of length: 7340032
----------------------------------------------------------------------
pickle => 13.768 seconds
(same length?) 7340032 == 7340032

cPickle => 2.038 seconds
(same length?) 7340032 == 7340032

marshal => 6.378 seconds
(same length?) 7340032 == 7340032

So, from these results we can see that marshal was extremely fast in the dumping part of the benchmark:

14.8x times faster than pickle and 2.6x times faster than cPickle.

But, for my big surprise, marshal was by far slower than cPickle in the loading part:

2.2x times faster than pickle, but 3.1x times slower than cPickle.

And as for RAM, marshal performance while loading was also very inefficient:

Ubuntu System Monitor

I'm guessing the reason why loading with marshal is so slow is somehow related with the length of the its serialized string (much longer than pickle and cPickle).

  • Why marshal dumps faster and loads slower?
  • Why marshal serialized string is so long?
  • Why marshal's loading is so inefficient in RAM?
  • Is there a way to improve marshal's loading performance?
  • Is there a way to merge marshal fast dumping with cPickle fast loading?
juliomalegria
  • 24,229
  • 14
  • 73
  • 89
  • 3
    Your question is a dead-end. The `marshal` module is not meant to be used as an alternative to `pickle`. There is no official documentation for the marshal file format and it might change from version to version, so your benchmark results might be false in the future. – Ferdinand Beyer Dec 15 '11 at 16:37
  • Concerning the speed differences: I suspect it's all about file IO: The file produced by marshal is nearly four times as large (112MB vs 30MB). – Ferdinand Beyer Dec 15 '11 at 16:40
  • 2
    possible duplicate of [Why is marshal so much faster than pickle?](http://stackoverflow.com/questions/329249/why-is-marshal-so-much-faster-than-pickle) – Ferdinand Beyer Dec 15 '11 at 17:05
  • @FerdinandBeyer I saw that question, is totally different to what I'm asking. Read my question again if you have doubts. About the downvoting, dowvoting says: "This question does not show any research effort; it is unclear or not useful.". Is my question unclear? not useful? or doesn't show any research? – juliomalegria Dec 15 '11 at 17:09
  • 1
    In the core, your question is *exactly* the same: Why is marshal faster? Speed/memory usage is the usual tradeoff in computing. And yes, your question is not useful. But of course this is just my opinion. – Ferdinand Beyer Dec 15 '11 at 17:13
  • 4
    It is not guaranteed that a file created by marshal now will be readable by all future versions of Python. Your research is pointless. – John Machin Dec 18 '11 at 08:58
  • @JohnMachin I don't get the relation between the marshal compatibility with future versions of Python and what I'm asking – juliomalegria Dec 18 '11 at 16:27
  • 1
    [Related question](http://stackoverflow.com/q/329249/183066) – jcollado Dec 19 '11 at 09:35
  • @FerdinandBeyer the [link](http://stackoverflow.com/questions/329249/whats-the-difference-between-marshal-and-cpickle) you've mentioned seems to be broken as of now. Can you possibly update the same or perhaps point us to another relevant one? – Tejas Shah Apr 18 '16 at 17:29

6 Answers6

20

cPickle has a smarter algorithm than marshal and is able to do tricks to reduce the space used by large objects. That means it'll be slower to decode but faster to encode as the resulting output is smaller. marshal is simplistic and serializes the object straight as-is without doing any further analyze it. That also answers why the marshal loading is so inefficient, it simply has to do more work - as in reading more data from disk - to be able to do the same thing as cPickle.

marshal and cPickle are really different things in the end, you can't really get both fast saving and fast loading since fast saving implies analyzing the data structures less which implies saving a lot of data to disk.

Regarding the fact that marshal might be incompatible to other versions of Python, you should generally use cPickle:

"This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the pickle module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal." (the python docs about marshal)

Johan Dahlin
  • 25,300
  • 6
  • 40
  • 55
14

Some people might think this too much of a hack, but I've had great success by simply wrapping the pickle dump calls with gc.disable() and gc.enable(). For example, the the snips below writing a ~50MB list of dictionaries goes from 78 seconds to 4.

#  not a complete example....
gc.disable()
cPickle.dump(params,fout,cPickle.HIGHEST_PROTOCOL)         
fout.close()               
gc.enable()
Chris
  • 141
  • 1
  • 2
  • This works perfectly! Total time required dropped by 20x for me as well. Though @Chris, can you point us towards any repercussions (if any) of the same? – Tejas Shah Apr 18 '16 at 17:32
  • @tdc, Tejas, you won't be able to dump acyclic object anymore, e.g. `x` in `x = []; x.append(x)` will cause a ValueError if Pickler.fast is enabled. – Kijewski May 17 '16 at 15:52
  • 3
    what about the load? – Moj Jul 29 '16 at 10:22
10

The difference between these benchmarks gives one idea for speeding up cPickle:

Input: ["This is a string of 33 characters" for _ in xrange(1000000)]
cPickle dumps 0.199 s loads 0.099 s 2002041 bytes
marshal dumps 0.368 s loads 0.138 s 38000005 bytes

Input: ["This is a string of 33 "+"characters" for _ in xrange(1000000)]
cPickle dumps 1.374 s loads 0.550 s 40001244 bytes
marshal dumps 0.361 s loads 0.141 s 38000005 bytes

In the first case, the list repeats the same string. The second list is equivalent, but each string is a separate object, because it is the result of an expression. Now, if you are originally reading your data in from an external source, you could consider some kind of string deduplication.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
5

You can make cPickle cca. 50x (!) faster by creating instance of cPickle.Pickler and then setting undocumented option 'fast' to 1:

outfile = open('outfile.pickle')
fastPickler = cPickle.Pickler(outfile, cPickle.HIGHEST_PROTOCOL)
fastPickler.fast = 1
fastPickler.dump(myHugeObject)
outfile.close()

But if your myHugeObject has cyclic references, the dump method will never end.

Michel Samia
  • 4,273
  • 2
  • 24
  • 24
3

You could improve the storage efficiency by compressing the serialize result.

My hunch are that compressing data and feeding it into the unserialize would be faster than reading raw from disk via HDD.

Test below was made to prove that compression would speed up the unserialize process. The result wasn't as expect since the machine were equip with SSD. On HHD equip machine compressing the data using lz4 would be faster since reading from disk average between 60-70mb/s.

LZ4: At a speed reduction of 18%, the compression yield 77.6% of additional storage.

marshal - compression speed time
Bz2 7.492605924606323  10363490
Lz4 1.3733329772949219 46018121
--- 1.126852035522461 205618472
cPickle - compression speed time
Bz2 15.488649845123291 10650522
Lz4 9.192650079727173  55388264
--- 8.839831113815308 204340701
Cong Do
  • 31
  • 2
  • Interesting results! Are you implying that you somehow avoided having to decompress the data before unserializing? If so, how? – seaotternerd Jun 22 '13 at 23:44
3

As you can see, the output produced by cPickle.dump has about 1/4 of the length of the output produced by marshal.dump. This means that cPickle must use a more complicated algorithm to dump the data as unneeded things are removed. When loading the dumped list, marshal has to work through much more data while cPickle can process its data quickly as there is less data that has to be analysed.

Regarding the fact that marshal might be incompatible to other versions of Python, you should generally use cPickle:

"This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the pickle module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal." (the python docs about marshal)

juliomalegria
  • 24,229
  • 14
  • 73
  • 89
hlt
  • 6,219
  • 3
  • 23
  • 43