I have an application which periodically dumps and loads a JSON file into Python using the standard JSON facilities.
Early on, we decided that it was a lot more convenient to work with the loaded JSON data as objects, rather than dictionaries. This really comes down to the convenience of "dot" member access, as opposed to []
notation for dictionary key lookup. One of the advantages of Javascript is that there is no real difference between dictionary lookup and member data access (which is why JSON is particularly suited to Javascript, I guess). But in Python, dictionary keys and object data members are different things.
So, our solution was to just use a custom JSON decoder which uses an object_hook
function to return objects instead of dictionaries.
And we lived happily ever after... until now, when this design decision may turn out to be a mistake. You see, now the JSON dump file has grown rather large, (> 400 MB). As far as I know, the standard Python 3 JSON facilities use native code to do the actual parsing, so they are quite fast. But if you provide a custom object_hook
, it still has to execute interpreted byte code for every JSON object decoded - which SERIOUSLY slows things down. Without object_hook
it takes only about 20 seconds to decode the whole 400 MB file. But with the hook, it takes over half an hour!
So, at this point 2 options come to mind, neither of which are very pleasant. One is to just forget about the convenience of using "dot" member data access, and just use Python dictionaries. (This means changing significant amounts of code.) The other is to write a C extension module and use that as the object_hook
, and see if we get any speedup.
I am wondering if there is some better solution I am not thinking of - perhaps an easier way to get "dot" member access while still initially decoding to a Python dictionary.
Any suggestions, solutions to this problem?