14

What is faster:

(A) 'Unpickling' (Loading) a pickled dictionary object, using pickle.load()

or

(B) Loading a JSON file to a dictionary using simplejson.load()

Assuming: The pickled object file exists already in case A, and that the JSON file exists already in case B.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Pranjal Mittal
  • 10,772
  • 18
  • 74
  • 99
  • 9
    Why ask random strangers on the Internet? Measure yourself! – NPE Aug 29 '13 at 18:02
  • 3
    It will actually depend on the content types, length, and overall size... Also while you're at it, you might also want to try cPickle and cjson (the latter is for 2.x.x, cjson is not available for 3.x.x) in your time trials. – Nisan.H Aug 29 '13 at 18:08
  • 2
    In addition to what @Nisan.H said, there are also third party (i.e. on PyPI) JSON libraries which claim to be significantly faster. –  Aug 29 '13 at 18:33

1 Answers1

27

The speed actually depends on the data, it's content and size.

But, anyway, let's take an example json data and see what is faster (Ubuntu 12.04, python 2.7.3) :

Giving this json structure dumped into test.json and test.pickle files:

{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

Testing script:

import timeit

import pickle
import cPickle

import json
import simplejson
import ujson
import yajl


def load_pickle(f):
    return pickle.load(f)


def load_cpickle(f):
    return cPickle.load(f)


def load_json(f):
    return json.load(f)


def load_simplejson(f):
    return simplejson.load(f)


def load_ujson(f):
    return ujson.load(f)


def load_yajl(f):
    return yajl.load(f)


print "pickle:"
print timeit.Timer('load_pickle(open("test.pickle"))', 'from __main__ import load_pickle').timeit()

print "cpickle:"
print timeit.Timer('load_cpickle(open("test.pickle"))', 'from __main__ import load_cpickle').timeit()

print "json:"
print timeit.Timer('load_json(open("test.json"))', 'from __main__ import load_json').timeit()

print "simplejson:"
print timeit.Timer('load_simplejson(open("test.json"))', 'from __main__ import load_simplejson').timeit()

print "ujson:"
print timeit.Timer('load_ujson(open("test.json"))', 'from __main__ import load_ujson').timeit()

print "yajl:"
print timeit.Timer('load_yajl(open("test.json"))', 'from __main__ import load_yajl').timeit()

Output:

pickle:
107.936687946

cpickle:
28.4231381416

json:
31.6450419426

simplejson:
20.5853149891

ujson:
16.9352178574

yajl:
18.9763481617

As you can see, unpickling via pickle is not that fast at all - cPickle is definetely the way to go if you choose pickling/unpickling option. ujson looks promising among these json parsers on this particular data.

Also, json and simplejson libraries load much faster on pypy (see Python JSON Performance).

See also:

It's important to note that the results may differ on your particular system, on other type and size of data.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Wow, pretty illustrative answer. I didn't even know simplejson is fater than json. Further, what could be the reason of the json way being faster? (I speculate that's because its easier to parse and convert json to dictionaries, rather than parsing some pickled string which could represent any python object.) – Pranjal Mittal Aug 29 '13 at 18:24
  • 1
    What time do you get when you use cpickle with your example? – Kevin London Aug 29 '13 at 18:26
  • Just a sec, will add `ujson` and `cPickle`. – alecxe Aug 29 '13 at 18:38
  • 3
    Please see updated answer. `cPickle` and `ujson` changed the whole picture here :) – alecxe Aug 29 '13 at 18:43
  • 1
    Added `yajl` to the benchmark. – alecxe Aug 29 '13 at 18:53
  • @pramttl answering your question: there were a discussion and useful links here: http://stackoverflow.com/questions/2259270/pickle-or-json. Please, check. – alecxe Aug 29 '13 at 18:55
  • 4
    Repeatedly opening the same file often results in the file being cached by the disk. Do your times change if you change the order of your tests? – Steven Rumbalski Aug 29 '13 at 19:10
  • @StevenRumbalski good point, but it doesn't affect the results - tested. Thank you, anyway. – alecxe Aug 29 '13 at 19:15
  • FWIW, in my own testing of just `pickle` vs `json`, using `cPickle` with the highest pickle protocol was faster than using json (Python 2.7.8 on Windows 7). Complete code and the results obtained have been posted [here](https://dl.dropboxusercontent.com/u/5508445/stackoverflow/what-is-faster-loading-a-pickled-dictionary-object-or-loading-a-json-file.txt) (since I can't post them as an answer to a closed question). – martineau Dec 10 '14 at 20:02