Mike Brennan's answer is close, but there isn't any reason to retraverse the entire structure. If you use the object_hook_pairs
(Python 2.7+) parameter:
object_pairs_hook
is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook
will be used instead of the dict
. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict
will remember the order of insertion). If object_hook
is also defined, the object_pairs_hook
takes priority.
With it, you get each JSON object handed to you, so you can do the decoding with no need for recursion:
def deunicodify_hook(pairs):
new_pairs = []
for key, value in pairs:
if isinstance(value, unicode):
value = value.encode('utf-8')
if isinstance(key, unicode):
key = key.encode('utf-8')
new_pairs.append((key, value))
return dict(new_pairs)
In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'
In [53]: json.load(open('test.json'))
Out[53]:
{u'1': u'hello',
u'abc': [1, 2, 3],
u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
u'def': {u'hi': u'mom'}}
In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]:
{'1': 'hello',
'abc': [1, 2, 3],
'boo': [1, 'hi', 'moo', {'5': 'some'}],
'def': {'hi': 'mom'}}
Notice that I never have to call the hook recursively since every object will get handed to the hook when you use the object_pairs_hook
. You do have to care about lists, but as you can see, an object within a list will be properly converted, and you don't have to recurse to make it happen.
A coworker pointed out that Python2.6 doesn't have object_hook_pairs
. You can still use this will Python2.6 by making a very small change. In the hook above, change:
for key, value in pairs:
to
for key, value in pairs.iteritems():
Then use object_hook
instead of object_pairs_hook
:
In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]:
{'1': 'hello',
'abc': [1, 2, 3],
'boo': [1, 'hi', 'moo', {'5': 'some'}],
'def': {'hi': 'mom'}}
Using object_pairs_hook
results in one less dictionary being instantiated for each object in the JSON object, which, if you were parsing a huge document, might be worth while.