Dealing with JSON with duplicate keys

Question

If I have JSON with duplicate keys and different values in each of the duplicate keys, how can I extract both in python?

ex:

{ 
   'posting': {
                'content': 'stuff',
                'timestamp': '123456789'
              }
   'posting': {
                'content': 'weird stuff',
                'timestamp': '93828492'
              }
}

If I wanted to grab both timestamps, how would I do so?

I tried a a = json.loads(json_str) and then a['posting']['timestamp'] but that only returns one of the values.

I suppose you could parse it by hand but this seems like a bad idea. Best option is to change the JSON as it is invalid. You should use a list instead. — Cfreak, Mar 23 '15 at 04:18
@user2357112 I didnt plan this, someone else did and I have to deal with it =[ — Liondancer, Mar 23 '15 at 05:49

sumit-sampang-rai · Accepted Answer · 2015-03-23T05:40:04.450

3

You can't have duplicate keys. You can change the object to array instead.

[
    {
        'content': 'stuff',
        'timestamp': '123456789'
    },
    {
        'content': 'weird stuff',
        'timestamp': '93828492'
    }
]

edited Mar 23 '15 at 05:40

answered Mar 23 '15 at 04:22

sumit-sampang-rai

701
1
7
16

score 2 · Answer 2 · answered Mar 23 '15 at 04:53

Duplicate keys actually overwrite the previous entry. Instead you maintain an array for that key. Example json is as below

{

'posting' : [
              {
                'content': 'stuff',
                'timestamp': '123456789'
              },
              {
                'content': 'weird stuff',
                'timestamp': '93828492'
              }
            ]

}

you can now access different elements in posting key like this

json.posting[0] , json.posting[1]

score 0 · Answer 3 · answered Sep 27 '22 at 14:25

As has already been covered: it is against the standard, and the outcome across systems is undefined, so avoid duplicate keys.

Yet, if a third party software component forces this upon you, note the section abut this topic from the standard library https://docs.python.org/3/library/json.html#repeated-names-within-an-object

By default, this module does not raise an exception; instead, it ignores all but the last name-value pair for a given name [...] The object_pairs_hook parameter can be used to alter this behavior.

So let's do it!

import itertools, json


def duplicate_object_pairs_hook(pairs):
    def _key(pair):
        (k, v) = pair
        return k
    def gpairs():
        for (k, group) in itertools.groupby(pairs, _key):
            ll = [v for (_, v) in group]
            (v, *extra) = ll
            yield (k, ll if extra else v)
    return dict(gpairs())


badj = """{ 
   "posting": {"content": "stuff", "timestamp": "123456789"},
   "posting": {"content": "weird stuff", "timestamp": "93828492"}
}"""

data = json.loads(badj, object_pairs_hook=duplicate_object_pairs_hook)

Now data evals to

{
    'posting': [
        {'content': 'stuff', 'timestamp': '123456789'},
        {'content': 'weird stuff', 'timestamp': '93828492'},
    ],
}

Remember that this hook will be called for every json node parsed, with the list of tuples of key-value pairs parsed. The default behavior should be equivalent to the dict constructor given a key-value tuple iterable.

Also, I assumed duplicate keys are adjacent, as that's my use-case, but you might have to sort the pairs before grouping them.

Dealing with JSON with duplicate keys

3 Answers3