4

I'm having trouble understanding how the object_hook functionality from json.loads() actually works. I found a similar question object_hook does not address the full json here, but I've tried to follow what I understand from it, and it's still not working for me. I already gathered that the object_hook function is called recursively in some way, but I'm failing to understand how to use it to construct complex object hierarchies from the json string. Consider the following json string, classes, and object_hook function:

import json
from pprint import pprint

jstr = '{"person":{ "name": "John Doe", "age": "46", \
           "address": {"street": "4 Yawkey Way", "city": "Boston", "state": "MA"} } }'

class Address:
    def __init__(self, street=None, city=None, state=None):
        self.street = street
        self.city = city
        self.state = state

class Person:
    def __init__(self, name=None, age=None, address=None):
        self.name = name
        self.age = int(age)
        self.address = Address(**address)

def as_person(jdict):
    if u'person' in jdict:
        print('person found')
        person = jdict[u'person']
        return Person(name=person[u'name'], age=person[u'age'], 
                      address=person[u'address'])
    else:
        return('person not found')
        return jdict

(I define classes with keyword args to provide defaults so that the json need not contain all elements, and I can still ensure that the attributes are present in the class instance. I will also eventually associate methods with the classes, but want to populate the instances from json data.)

If I run:

>>> p = as_person(json.loads(jstr))

I get what I expect, ie:

person found

and p becomes a Person object, ie:

>>> pprint(p.__dict__)
{'address': <__main__.Address instance at 0x0615F3C8>,
 'age': 46,
 'name': u'John Doe'}
>>> pprint(p.address.__dict__)
{'city': u'Boston', 'state': u'MA', 'street': u'4 Yawkey Way'}

However, if instead, I try to use:

>>> p = json.loads(jstr, object_hook=as_person)

I get:

person found
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Program Files (x86)\Python27\lib\json\__init__.py", line 339, in loads
    return cls(encoding=encoding, **kw).decode(s)
  File "C:\Program Files (x86)\Python27\lib\json\decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files (x86)\Python27\lib\json\decoder.py", line 382, in 
raw_decode
    obj, end = self.scan_once(s, idx)
  File "<interactive input>", line 5, in as_person
TypeError: string indices must be integers, not unicode

I have no idea why this would happen, and suspect there is some subtlety around how the object_hook mechanism works that I'm missing.

In an attempt to incorporate the notion from the aforementioned question, which was that the object_hook evaluates each nested dictionary from the bottom up (and replaces it in the traverse?) I also tried:

def as_person2(jdict):
    if u'person' in jdict:
        print('person found')
        person = jdict[u'person']
        return Person2(name=person[u'name'], age=person[u'age'], address=person[u'address'])
    elif u'address' in jdict:
        print('address found')
        return Address(jdict[u'address'])
    else:
        return('person not found')
        return jdict

>>> json.loads(jstr, object_hook=as_person2)
address found
person found
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Program Files (x86)\Python27\lib\json\__init__.py", line 339, in loads
    return cls(encoding=encoding, **kw).decode(s)
  File "C:\Program Files (x86)\Python27\lib\json\decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files (x86)\Python27\lib\json\decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "<interactive input>", line 5, in as_person2
AttributeError: Address instance has no attribute '__getitem__'

So, clearly, the proper form of the object_hook function is escaping me.

Can someone explain in detail how the object_hook mechanism works, and how the resulting object tree is supposed to be recursively constructed from the bottom up, why my code doesn't work as expected, and either fix my example or provide one that uses an object_hook function to build a complex class, given that you only get the one object_hook function?

Community
  • 1
  • 1
W. Sadkin
  • 261
  • 3
  • 8
  • Have you confirmed what the contents of `person` is? – Scott Hunter Apr 07 '17 at 20:03
  • I realize I forgot to post my Person2 class above; I changed the address assignment in the class Person to just say self.address = address – W. Sadkin Apr 07 '17 at 20:16
  • @ScottHunter: if you mean p, yes; it's a Person instance: <__main__.Person instance at 0x0615F148>. (the other calls never got that far.) – W. Sadkin Apr 07 '17 at 20:40

2 Answers2

7

Through experimentation, I have answered my own question; this may not be the best solution, and I welcome further analysis or a better way, but this sheds light on how the object_hook process works, so it may be instructive to others facing the same issues.

The key observation was that, at every level of the json tree walk, the object_hook mechanism expects you to return a dictionary, so if you want to change the subdictionaries into class instances, you have to replace the current object_hook function invocation's input dictionary values with objects, and not just return the object instances.

The solution below allows a bottom-up means of building up the object hierarchy. I've inserted print statements to show how the loads object_hook is called on on subsections of the json string as it's processed, which I found quite illuminating, and helpful to me in build a working function.

import json
from pprint import pprint

jstr = '{"person":{ "name": "John Doe", "age": "46", \
         "address": {"street": "4 Yawkey Way", "city": "Boston", "state": "MA"} } }'

class Address:
    def __init__(self, street=None, city=None, state=None):
        self.street=street
        self.city=city
        self.state = state
    def __repr__(self):
        return('Address(street={self.street!r}, city={self.city!r},' 
                         'state={self.state!r})'.format(self=self))

class Person:
    def __init__(self, name=None, age=None, address=None):
        self.name = name
        self.age = int(age)
        self.address=address
    def __repr__(self):
        return('Person(name={self.name!r}, age={self.age!r},\n'
               '       address={self.address!r})'.format(self=self))

def as_person4(jdict):
    if 'person' in jdict:
        print('person in jdict; (before substitution):')
        pprint(jdict)
        jdict['person'] = Person(**jdict['person'])
        print('after substitution:')
        pprint(jdict)
        print
        return jdict
    elif 'address' in jdict:
        print('address in jdict; (before substitution):'),
        pprint(jdict)
        jdict['address'] = Address(**jdict['address'])
        print('after substitution:')
        pprint(jdict)
        print
        return jdict
    else:
        print('jdict:')
        pprint(jdict)
        print
        return jdict

>>> p =json.loads(jstr, object_hook=as_person4)
jdict:
{u'city': u'Boston', u'state': u'MA', u'street': u'4 Yawkey Way'}

address in jdict; (before substitution):
{u'address': {u'city': u'Boston', u'state': u'MA', u'street': u'4 Yawkey Way'},
u'age': u'46', u'name': u'John Doe'}
after substitution:
{u'address': Address(street=u'4 Yawkey Way', city=u'Boston', state=u'MA'),
u'age': u'46', u'name': u'John Doe'}

person in jdict; (before substitution):
{u'person': {u'address': Address(street=u'4 Yawkey Way', city=u'Boston', state=u'MA'),
         u'age': u'46', u'name': u'John Doe'}}
after substitution:
{u'person': Person(name=u'John Doe', age=46,
   address=Address(street=u'4 Yawkey Way', city=u'Boston', state=u'MA'))}

>>> p
{u'person': Person(name=u'John Doe', age=46,
   address=Address(street=u'4 Yawkey Way', city=u'Boston', state=u'MA'))}
>>> 

Note that what is returned is still a dictionary, where the key is 'person', and the value is the Person object (rather than just a Person object), but this solution does provide an extensible bottom-up object construction method.

W. Sadkin
  • 261
  • 3
  • 8
  • Nicely done and looks fairly efficient to me—no need to apologize. From the lack of other answers, appears to be something most folks—here at least—didn't know how or a good way to do (myself included). Thanks for enlightening us. – martineau Apr 10 '17 at 21:21
1

I agree it's non-intuitive, but you can simply ignore the dictionary passed when it's not the kind of object you're interested in. Which means that this would probably be the simplest way:

(As you can also see, you don't need all those u string prefixes, either.)

import json

jstr = '{"person": { "name": "John Doe", "age": "46", \
           "address": {"street": "4 Yawkey Way", "city": "Boston", "state": "MA"} } }'

class Address:
    def __init__(self, street=None, city=None, state=None):
        self.street = street
        self.city = city
        self.state = state

    def __repr__(self):  # optional - added so print displays something useful
        return('Address(street={self.street!r}, city={self.city!r}, '
               'state={self.state!r})'.format(self=self))

class Person:
    def __init__(self, name=None, age=None, address=None):
        self.name = name
        self.age = int(age)
        self.address = address

    def __repr__(self):  # optional - added so print displays something useful
        return('Person(name={self.name!r}, age={self.age!r},\n'
               '       address={self.address!r})'.format(self=self))

def as_person3(jdict):
    if 'person' not in jdict:
        return jdict
    else:
        person = jdict['person']
        address = Address(**person['address'])
        return Person(name=person['name'], age=person['age'], address=address)

p = json.loads(jstr, object_hook=as_person3)
print(p)

Output:

Person(name=u'John Doe', age=46,
       address=Address(street=u'4 Yawkey Way', city=u'Boston', state=u'MA'))
martineau
  • 119,623
  • 25
  • 170
  • 301
  • The problem with the example is that p.address ends up just a dictionary, and not an Address instance. How would you modify this so that I could build up the class instance hierarchy (possibly several levels deep), using the object_hook? – W. Sadkin Apr 09 '17 at 22:44
  • W.Sadkin: Indeed...thought of that myself a while after posting my original answer—see updated version. – martineau Apr 09 '17 at 23:49
  • I am not sure how this is very different from my first example, where I received the first traceback; the only difference seems to be that you are constructing the Address object in the object_hook function, rather than in the constructor of the Person object. And again, this feels like the wrong approach anyway because it effectively is doing all the dictionary to object conversion at the same level. What I'd like to see is how to have the object hierarchy built using the tree-walk that the documentation suggest happens inside the json module, from the deepest level to the top. – W. Sadkin Apr 10 '17 at 14:40
  • @W. Sadkin: Where is "the tree-walk that the documentation suggest happens"? Seems to me it's walking top-down—which I agree is not useful with respect to doing what you want. – martineau Apr 10 '17 at 15:49
  • Actually, it's not in the documentation (which was part of the problem), but it is stated, in the related post I cited in my original question, that this is what takes place, and my solution above suggests that it is, indeed, true. – W. Sadkin Apr 10 '17 at 20:48