3

If I have JSON with duplicate keys and different values in each of the duplicate keys, how can I extract both in python?

ex:

{ 
   'posting': {
                'content': 'stuff',
                'timestamp': '123456789'
              }
   'posting': {
                'content': 'weird stuff',
                'timestamp': '93828492'
              }
}

If I wanted to grab both timestamps, how would I do so?

I tried a a = json.loads(json_str) and then a['posting']['timestamp'] but that only returns one of the values.

Liondancer
  • 15,721
  • 51
  • 149
  • 255

3 Answers3

3

You can't have duplicate keys. You can change the object to array instead.

[
    {
        'content': 'stuff',
        'timestamp': '123456789'
    },
    {
        'content': 'weird stuff',
        'timestamp': '93828492'
    }
]
sumit-sampang-rai
  • 701
  • 1
  • 7
  • 16
2

Duplicate keys actually overwrite the previous entry. Instead you maintain an array for that key. Example json is as below

{

'posting' : [
              {
                'content': 'stuff',
                'timestamp': '123456789'
              },
              {
                'content': 'weird stuff',
                'timestamp': '93828492'
              }
            ]

}

you can now access different elements in posting key like this

json.posting[0] , json.posting[1]

0

As has already been covered: it is against the standard, and the outcome across systems is undefined, so avoid duplicate keys.

Yet, if a third party software component forces this upon you, note the section abut this topic from the standard library https://docs.python.org/3/library/json.html#repeated-names-within-an-object

By default, this module does not raise an exception; instead, it ignores all but the last name-value pair for a given name [...] The object_pairs_hook parameter can be used to alter this behavior.

So let's do it!

import itertools, json


def duplicate_object_pairs_hook(pairs):
    def _key(pair):
        (k, v) = pair
        return k
    def gpairs():
        for (k, group) in itertools.groupby(pairs, _key):
            ll = [v for (_, v) in group]
            (v, *extra) = ll
            yield (k, ll if extra else v)
    return dict(gpairs())


badj = """{ 
   "posting": {"content": "stuff", "timestamp": "123456789"},
   "posting": {"content": "weird stuff", "timestamp": "93828492"}
}"""

data = json.loads(badj, object_pairs_hook=duplicate_object_pairs_hook)

Now data evals to

{
    'posting': [
        {'content': 'stuff', 'timestamp': '123456789'},
        {'content': 'weird stuff', 'timestamp': '93828492'},
    ],
}

Remember that this hook will be called for every json node parsed, with the list of tuples of key-value pairs parsed. The default behavior should be equivalent to the dict constructor given a key-value tuple iterable.

Also, I assumed duplicate keys are adjacent, as that's my use-case, but you might have to sort the pairs before grouping them.

N1ngu
  • 2,862
  • 17
  • 35