2

I have a string that could be parsed as a JSON or dict object. My string variable looks like this :

my_string_variable = """{
                        "a":1,
                        "b":{
                             "b1":1,
                             "b2":2
                         },  
                        "b": { 
                            "b1":3, 
                            "b2":2,
                            "b4":8
                         } 
                       }"""

When I do json.loads(my_string_variable), I have a dict but only the second value of the key "b" is kept, which is normal because a dict can't contain duplicate keys.

What would be the best way to have some sort of defaultdict like this :

result = {
    "a": 1,
    "b": [{"b1": 1, "b2": 2}, {"b1": 3, "b2": 2, "b4": 8}],
}

I have already looked for similar questions but they all deal with dicts or lists as an input and then create defaultdicts to handle the duplicate keys.

In my case I have a string variable and I would want to know if there is a simple way to achieve this.

S.B
  • 13,077
  • 10
  • 22
  • 49
Samiella
  • 89
  • 2
  • 7
  • take a look at this thread: http://stackoverflow.com/questions/5946236/how-to-merge-multiple-dicts-with-same-key .it should help you get going – Ma0 Jul 11 '16 at 12:41
  • 1
    There probably isn't. Having duplicate keys simply doesn't make sense for mappings. You'd require a custom parser that switches from single-value to multiple-value upon encountering a key again. You might be able to modify the [JSON decoder](https://docs.python.org/3/library/json.html#encoders-and-decoders) with some effort. Note that things get a lot easier if *all* keys have sequence values (even if they are just length 1). – MisterMiyagi Jul 11 '16 at 12:43
  • 3
    Looks more like a dupe: `object_pairs_hook`: http://stackoverflow.com/questions/14902299/json-loads-allows-duplicate-keys-in-a-dictionary-overwriting-the-first-value – Moses Koledoye Jul 11 '16 at 12:44
  • @MosesKoledoye : I didn't know about object_pairs_hook, it solved my problem :D thanks – Samiella Jul 11 '16 at 13:33

2 Answers2

7

something like the following can be done.

import json

def join_duplicate_keys(ordered_pairs):
    d = {}
    for k, v in ordered_pairs:
        if k in d:
           if type(d[k]) == list:
               d[k].append(v)
           else:
               newlist = []
               newlist.append(d[k])
               newlist.append(v)
               d[k] = newlist
        else:
           d[k] = v
    return d

raw_post_data = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'
newdict = json.loads(raw_post_data, object_pairs_hook=join_duplicate_keys)
print (newdict)

Please note that above code depends on value type, if type(d[k]) == list. So if original string itself gives a list then there could be some error handling required to make the code robust.

Kevin
  • 901
  • 1
  • 7
  • 15
  • Maybe use `defaultdict(list)` instead of checking for type? Also removes the potential problem you're mentioning. – user1337 Jul 11 '16 at 13:37
0

Accepted answer is perfectly fine. I just wanted to show another approach.

So at first, you dedicate a list for values in order to easily accumulate next values. At the end, you call pop on the lists which have only one item. This means that the list doesn't have duplicate values:

import json
from collections import defaultdict

my_string_variable = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'


def join_duplicate_keys(ordered_pairs):
    d = defaultdict(list)
    for k, v in ordered_pairs:
        d[k].append(v)
    return {k: v.pop() if len(v) == 1 else v for k, v in d.items()}


d = json.loads(my_string_variable, object_pairs_hook=join_duplicate_keys)
print(d)

output:

{'a': 1, 'b': [{'b1': 1, 'b2': 2}, {'b1': 3, 'b2': 2, 'b4': 8}]}
S.B
  • 13,077
  • 10
  • 22
  • 49