10

I have a list of dicts and would like to design a function to output a new dict which contains the sum for each unique key across all the dicts in the list.

For the list:

[
    {
         'apples': 1,
         'oranges': 1,
         'grapes': 2
    },
    {
         'apples': 3,
         'oranges': 5,
         'grapes': 8
    },
    {
         'apples': 13,
         'oranges': 21,
         'grapes': 34
    }
]

So far so good, this can be done with a counter:

def sumDicts(listToProcess):
    c = Counter()
    for entry in listToProcess:
        c.update(entry)
    return (dict(c))

Which correctly returns:

{'apples': 17, 'grapes': 44, 'oranges': 27}

The trouble comes when the dicts in my list start to contain nested dicts:

[
    {
        'fruits': {
            'apples': 1,
            'oranges': 1,
            'grapes': 2
            },
        'vegetables': {
            'carrots': 6,
            'beans': 3,
            'peas': 2
        },
        'grains': 4,
        'meats': 1  
    },
    {
        'fruits': {
            'apples': 3,
            'oranges': 5,
            'grapes': 8
            },
        'vegetables': {
            'carrots': 7,
            'beans': 4,
            'peas': 3
        },
        'grains': 3,
        'meats': 2  
    },
    {
        'fruits': {
            'apples': 13,
            'oranges': 21,
            'grapes': 34
            },
        'vegetables': {
            'carrots': 8,
            'beans': 5,
            'peas': 4
        },
        'grains': 2,
        'meats': 3
    },
]

Now the same function will give a TypeError because the counter can't add two Dicts.

The desired result would be:

{
    'fruits': {
        'apples': 17,
        'oranges': 27,
        'grapes': 44
        },
    'vegetables': {
        'carrots': 21,
        'beans': 12,
        'peas': 9
    },
    'grains': 9,
    'meats': 6  
}

Any ideas on how to do this in a reasonably efficient, Pythonic, generalizable way?

funnydman
  • 9,083
  • 4
  • 40
  • 55
Ben Franske
  • 330
  • 2
  • 11

2 Answers2

5

I would do this by performing a recursive merge on a recursively defined collections.defaultdict object.

from collections import defaultdict

def merge(d, new_d):
    for k, v in new_d.items():
        if isinstance(v, dict):
            merge(d[k], v)
        else: 
            d[k] = d.setdefault(k, 0) + v

# https://stackoverflow.com/a/19189356/4909087    
nested = lambda: defaultdict(nested)
d = nested()

for subd in data:
    merge(d, subd)

Using default_to_regular to convert it back, we have:

default_to_regular(d)
# {
#     "fruits": {
#         "apples": 17,
#         "oranges": 27,
#         "grapes": 44
#     },
#     "vegetables": {
#         "carrots": 21,
#         "beans": 12,
#         "peas": 9
#     },
#     "grains": 9,
#     "meats": 6
# }
cs95
  • 379,657
  • 97
  • 704
  • 746
  • When I run `for subd in data: print(merge(d, subd))` I get 3 x None. – Mykola Zotko Jan 20 '19 at 10:32
  • 1
    @MykolaZotko You are not meant to print it. It modifies `d` in-place. – cs95 Jan 20 '19 at 10:34
  • 1
    Of course. Your function doesn’t return any value. It wasn’t clear where the final result was. – Mykola Zotko Jan 20 '19 at 10:55
  • Thanks for reminding me about using them for this! I had actually been playing with defaultdicts before for the original problem of summing dicts by keys but had settled on using a counter as it seemed to be a simpler solution for the non-nested variant. – Ben Franske Jan 21 '19 at 17:33
0

You can use recursion. This solution finds all the dictionary keys in the input passed to merge, and then sums the values for each key if the values are integers. If the values are dictionaries, however, merge is called again:

def merge(c):
  _keys = {i for b in c for i in b}
  return {i:[sum, merge][isinstance(c[0][i], dict)]([h[i] for h in c]) for i in _keys}

d = [{'fruits': {'apples': 1, 'oranges': 1, 'grapes': 2}, 'vegetables': {'carrots': 6, 'beans': 3, 'peas': 2}, 'grains': 4, 'meats': 1}, {'fruits': {'apples': 3, 'oranges': 5, 'grapes': 8}, 'vegetables': {'carrots': 7, 'beans': 4, 'peas': 3}, 'grains': 3, 'meats': 2}, {'fruits': {'apples': 13, 'oranges': 21, 'grapes': 34}, 'vegetables': {'carrots': 8, 'beans': 5, 'peas': 4}, 'grains': 2, 'meats': 3}]

import json
print(json.dumps(merge(d), indent=4))

Output:

{
 "meats": 6,
 "grains": 9,
 "fruits": {
    "grapes": 44,
    "oranges": 27,
    "apples": 17
 },
"vegetables": {
     "beans": 12,
     "peas": 9,
     "carrots": 21
  }
}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • 1
    Whoever downvoted my answer and upvoted this must have very skewed beliefs as to what readable code looks like. I understand what you are trying to do with `[sum, merge][isinstance(c[0][i], dict)]([h[i] for h in c])` but using booleans to index this way is discouraged. A simple if-else can do the same thing and look twice as readable in the process. Granted you won't be able to fit it in a single line of 80 characters, but the beauty of it is that you don't have to. – cs95 Jan 20 '19 at 21:46