Summing 2 level of nested dictionaries in Python

Question

I have 2 nested dictionaries variable that have the similar keys, each defining different values :

data1 = {
"2010":{
        'A':2,
        'B':3,
        'C':5
    },
"2011":{
        'A':1,
        'B':2,
        'C':3
    },
"2012":{
        'A':1,
        'B':2,
        'C':4
    }
}

data2 = {
"2010":{
        'A':4,
        'B':4,
        'C':5
    },
"2011":{
        'A':1,
        'B':1,
        'C':3
    },
"2012":{
        'A':3,
        'B':2,
        'C':4
    }
}

In my case, i need to sum both dictionaries values based on the same keys, so the answer will be like this:

data3 = {
"2010":{
        'A':6,
        'B':7,
        'C':10
    },
"2011":{
        'A':2,
        'B':3,
        'C':6
    },
"2012":{
        'A':4,
        'B':4,
        'C':8
    }
}

How can i do that?

Is it guaranteed that the structure of the two dictionaries is the same? — Willem Van Onsem, Feb 14 '17 at 12:36

score 2 · Accepted Answer · answered Feb 14 '17 at 12:36

2

Given the structure of the two dictionaries is the same, you can use dictionary comprehension for that:

data3 = {key:{key2:val1+data2[key][key2] for key2,val1 in subdic.items()} for key,subdic in data1.items()}

In the repl:

>>> {key:{key2:val1+data2[key][key2] for key2,val1 in subdic.items()} for key,subdic in data1.items()}
{'2010': {'B': 7, 'C': 10, 'A': 6}, '2012': {'B': 4, 'C': 8, 'A': 4}, '2011': {'B': 3, 'C': 6, 'A': 2}}

The comprehension works as follows: in the outerloop, we iterate over the key,subdic of data1. So in your case, key is a year and subdic is the dictionary (of data1) for that year.

Now for each of these years, we iterate over the items of the subdic and here key2 is 'A', 'B' and 'C'. val1 is the value that we find in data1 for these keys. We get the other value by querying data2[key][key2]. We sum these up and construct new dictionaries for that.

answered Feb 14 '17 at 12:36

Willem Van Onsem

443,496
30
428
555

Thank you very much, Willem, it's solve my problem in more complex condition too.. – Faizalprbw Feb 14 '17 at 12:51
@Faizalprbw: mind however, as the answer says this only works if the structure is identical: so both dictionaries need to contain `2010` and `A`, `B`, and `C` into `2010` etc. – Willem Van Onsem Feb 14 '17 at 12:52
1

Yes i do understand, actually i have two json data that contains the same structure with similar keys, just like my example above.. – Faizalprbw Feb 14 '17 at 13:07
Btw, How can i make that dictionary sorted just like my example (data3) below..? Thanks.. @Willem Van Onsem – Faizalprbw Feb 15 '17 at 10:45
@Faizalprbw: dictionaries are *unsorted*. So there are no guarantees on the order at all. Usually if you add to the dictionary manually, it will keep the order for a while until it rehashes. But none of the answers can give guarantees. In order to solve it, you need an [`OrderedDict`](http://stackoverflow.com/questions/1867861/python-dictionary-how-to-keep-keys-values-in-same-order-as-declared). – Willem Van Onsem Feb 15 '17 at 10:54

Kruupös · Answer 2 · 2017-02-14T16:30:49.490

1

Another solution :) You can also use zip to get both data1 and data2 in the same for loop, and then use collections.Counter to add the value of each dicts.

from collections import Counter

>> {k1: Counter(v1) + Counter(v2) for (k1, v1), (k2, v2) in zip(sorted(data1.items()), sorted(data2.items()))}
{'2011': Counter({'C': 6, 'B': 3, 'A': 2}), '2010': Counter({'C': 10, 'B': 7, 'A': 6}), '2012': Counter({'C': 8, 'A': 4, 'B': 4})}

You will ended with Counter dict but since it is a subclass of dict you can still use the same method as a regular dict.

edited Feb 14 '17 at 16:30

answered Feb 14 '17 at 13:06

Kruupös

5,097
3
27
43

This will not always work since `k1` can be different than `k2`. Note that the keys are **not** ordered in a dictionary. If you however do a lookup in `dict2` it will work. – Willem Van Onsem Feb 14 '17 at 13:08
Your remark is incorrect, `zip` will always iterate over the same `keys` on the dictionaries. Or maybe I misunderstood what you said. Can you please provide a simple example? Your solution is good also but the problem lays where there are `n` dictionary to concat between them. – Kruupös Feb 14 '17 at 13:13
Just want to add, the `zip` will iterate over the same `keys` of the 2 dictionary if _and only if_ they have _all_ the `keys` in common, otherwise it won't work. – Kruupös Feb 14 '17 at 14:09
I don't know if that is true and even if it is, it is as far as I know not documented, so you cannot guarantee that: a person that constructs a valid Python interpreter can freely implement this aspect. – Willem Van Onsem Feb 14 '17 at 14:10
Ok I'm totaly wrong https://docs.python.org/3/library/stdtypes.html#mapping-types-dict ... my bad sorry. It's very unlikely to happen on short dict but it could happen on huge ones. I updated my answer with `sorted()` but now the one-liner is getting too long. Another solution will be to recreate the list with `OrderedDict` – Kruupös Feb 14 '17 at 16:30
it can be very problematic if one **adds* elements to the dictionary later, since in that case the buckets are usually not incremented immediately nor is rehashing performed. – Willem Van Onsem Feb 14 '17 at 16:31

Spherical Cowboy · Answer 3 · 2017-02-14T16:58:56.463

If you add dict() to Max Chrétiens' nice short solution from above, you will end up with regular dictionaries:

data3 = {k1: dict(Counter(v1) + Counter(v2)) for (k1, v1), (k2, v2) in
         zip(data1.items(), data2.items())}

This will, however, only work correctly if both dictionaries share exactly the same keys as already discussed above. Willem Van Onsem's solution will not work if there are any keys not shared by both dictionaries either (it will result in an error, whereas Max Chrétiens' solution will in this case merge items incorrectly). Now you mentioned you are using JSON data which always contains the same structure with similar keys, so this should not constitute a problem and Max Chrétien's solution should work nicely.

In case you do want to make sure only keys shared by both dictionaries (and their subdictionaries) are used, the following will work. Notice how I added 'X': 111111 as a key value pair to the 2012 subdictionary and "1999": { 'Z': 999999 } as an entire subdictionary.

def sum_two_nested_dicts(d1, d2):
    dicts = [d1, d2]
    d_sum = {}
    for topkey in dicts[0]:
        if topkey in dicts[1]:
            d_sum[topkey] = {}
            for key in dicts[0][topkey]:
                if key in dicts[1][topkey]:
                    new_val = sum([d[topkey][key] for d in dicts])
                    d_sum[topkey][key] = new_val
    return d_sum


data1 = {
    "2010": {
        'A': 2,
        'B': 3,
        'C': 5
    },
    "2011": {
        'A': 1,
        'B': 2,
        'C': 3
    },
    "2012": {
        'A': 1,
        'B': 2,
        'C': 4,
        'X': 111111
    },
    "1999": {
        'Z': 999999
    }
}

data2 = {
    "2010": {
        'A': 4,
        'B': 4,
        'C': 5
    },
    "2011": {
        'A': 1,
        'B': 1,
        'C': 3
    },
    "2012": {
        'A': 3,
        'B': 2,
        'C': 4
    }
}

data3 = sum_two_nested_dicts(data1, data2)

print(data3)

# different order of arguments

data4 = sum_two_nested_dicts(data2, data1)

print(data4)

# {'2010': {'C': 10, 'A': 6, 'B': 7}, '2012': {'C': 8, 'A': 4, 'B': 4}, '2011': {'C': 6, 'A': 2, 'B': 3}}
# {'2010': {'C': 10, 'A': 6, 'B': 7}, '2012': {'C': 8, 'A': 4, 'B': 4}, '2011': {'C': 6, 'A': 2, 'B': 3}}

I realize this is far from as concise and elegant as can be, but as I already wrote it anyways, I post it here in case someone is trying to achieve this particular functionality.

Long and bloated version which retains unshared keys/values, just because I already wrote it...

def sum_nested_dicts(dic1, dic2):
    # create list of both dictionaries
    dicts = [dic1, dic2]
    # create a set of all unique keys from both dictionaries
    topkeys = set(sum([list(dic.keys()) for dic in dicts], []))
    # this is the merged dictionary to be returned
    d_sum = {}
    for topkey in topkeys:
        # if topkey is shared by both dictionaries
        if topkey in dic1 and topkey in dic2:
            d_sum[topkey] = {}
            keys = set(sum([list(dic[topkey].keys()) for dic in
                            dicts], []))
            for key in keys:
                # if key is shared by both subdictionaries
                if key in dic1[topkey] and key in dic2[topkey]:
                    new_val = sum([d[topkey][key] for d in dicts])
                    d_sum[topkey][key] = new_val
                # if key is only contained in one subdictionary
                elif key in dic1[topkey]:
                    d_sum[topkey][key] = dic1[topkey][key]
                elif key in dic2[topkey]:
                    d_sum[topkey][key] = dic2[topkey][key]
        # if topkey is only contained in one dictionary
        elif topkey in dic1:
            d_sum[topkey] = dic1[topkey]
        elif topkey in dic2:
            d_sum[topkey] = dic2[topkey]
    return d_sum

See Crystal's solution for what seems to be the most concise and functional solution posted thus far.

Thx for upgrading my solution :D Your solution looks like adapted to any situations, however I think it would be nice to be able to make a function that could merge `n` dictionaries when the keys are identical. Also, add the other values even if they doesnt both dict (e.g your `'X': 111111`) — Kruupös, Feb 14 '17 at 16:39
Added my for-loop-and-if-condition-frenzy version above. But see Crystal's solution for what seems to be the most elegant solution so far. :) — Spherical Cowboy, Feb 14 '17 at 17:00

Darth · Answer 4 · 2017-02-14T15:53:09.293

I hope this helps:

    data1 = { "2010":{ 'A':2, 'B':3, 'C':5 }, "2011":{ 'A':1, 'B':2, 'C':3 }, "2012":{ 'A':1, 'B':2, 'C':4 } } 
    data2 = { "2010":{ 'A':4, 'B':4, 'C':5 }, "2011":{ 'A':1, 'B':1, 'C':3 }, "2012":{ 'A':3, 'B':2, 'C':4 } }

    data3 = {}

    for data in [data1,data2]:
        for year in data.keys():
                for x,y in data[year].items():
                    if not year in data3.keys():
                        data3[year] = {x:y}
                    else:
                        if not x in data3[year].keys():
                            data3[year].update({x:y})
                        else:
                            data3[year].update({x:data3[year][x] + y})
    print data3

This works for arbitrary lengths of the inner and outer dictionaries.

This is great! It will also work if there are keys in one dictionary but not in the other. Furthermore, it will work if one subkey (key of a nested dictionary, i.e. a dictionary that is a value of the outer dictionary) is not shared by both dictionaries. Seems to cover all the cases I tested. :-) — Spherical Cowboy, Feb 14 '17 at 16:55

Summing 2 level of nested dictionaries in Python

4 Answers4