0

Using the approach described here, I pass OrderedDict as object_pairs_hook when loading a nested JSON file, to preserve order.

Order is preserved, and this is fine for most of the JSON object. But there are parts of the JSON (at the lowest level of nesting), which look like:

"In Content" : { "Sulvo" : "abc.com_336x280_he-inlinecontentmobile", "Sulvo" : "abc.com_336x280_he-inlinecontentmobile_level2", "Sulvo" : "abc.com_336x280_he-inlinecontentmobile_level3", "Adsense" : "" },

And when processed, only one of these identical keys gets preserved:

OrderedDict([(u'Sulvo', u'homeepiphany.com_336x280_he-inlinecontentmobile_level3'), (u'Adsense', u'')])),

I know that we can have a dictionary which has multiple items of the same key name with a defaultdict. The following doesn't work though, and even it it did, I presume we would gain the keys but lose the order, so we'd be no better off:

j = json.load(open('he.json'), object_pairs_hook=defaultdict)

Is it possible to maintain order AND preserve all keys in one go?

Python 2.7.12

Community
  • 1
  • 1
Pyderman
  • 14,809
  • 13
  • 61
  • 106

1 Answers1

3

If you look at the docs for json.load, they outline what the object_pairs_hook parameter does:

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict.

All you need to do is write a function that, given a list of (key, value) pairs, constructs your object.

One approach would be to do nothing and just pass the items list straight through without constructing a dictionary:

def handle_mapping(items):
    return items

Then, your JSON is parsed like so:

[(u'In Content',
  [(u'Sulvo', u'abc.com_336x280_he-inlinecontentmobile'),
   (u'Sulvo', u'abc.com_336x280_he-inlinecontentmobile_level2'),
   (u'Sulvo', u'abc.com_336x280_he-inlinecontentmobile_level3'),
   (u'Adsense', u'')])]

If you do want to merge the values of duplicate keys into a list, you can use OrderedDict:

def handle_mapping(items):
    d = OrderedDict()
    duplicate_keys = set()

    for key, value in items:
        # So [('k', 'v')] becomes {'k': 'v'}
        if key not in d:
            d[key] = value
        else:
            # So [('k', 'v1'), ('k', 'v2')] becomes {'k': ['v1', 'v2']}
            if key not in duplicate_keys:
                duplicate_keys.add(key)
                d[key] = [d[key]]

            d[key].append(value)

    return d

Then, you'd object would be parsed as:

OrderedDict([(u'In Content',
              OrderedDict([(u'Sulvo',
                            [u'abc.com_336x280_he-inlinecontentmobile',
                             u'abc.com_336x280_he-inlinecontentmobile_level2',
                             u'abc.com_336x280_he-inlinecontentmobile_level3']),
                           (u'Adsense', u'')]))])
Blender
  • 289,723
  • 53
  • 439
  • 496
  • That's one hell of an answer. I got as far as the `handle_mapping()` suggestion. It was surprising to me that passing just the json file to `json.load()` - i.e. not specifying an `object_pairs_hook` - also resulted in loss of items, whereas passing the dummy function resulted in them being preserved. Will try your full solution tomorrow. Thanks. – Pyderman Oct 31 '16 at 02:56
  • @Pyderman: In Python, `dict([('a', 1), ('a', 1)]) == {'a': 1} == {'a': 1, 'a': 1}`, since having duplicate keys in a dictionary doesn't make sense (what would `d['a']` do?). JavaScript (the J in **J**SON) does the same thing. You have to tell Python to use something other than a dictionary to hold your elements if your JSON is weird like yours. – Blender Oct 31 '16 at 03:10