How to sum a list of dicts

Question

What is the most Pythonic way to take a list of dicts and sum up all the values for matching keys from every row in the list?

I did this but I suspect a comprehension is more Pythonic:

from collections import defaultdict
demandresult = defaultdict(int)   # new blank dict to store results 
for d in demandlist:
    for k,v in d.iteritems():
        demandresult[k] = demandresult[k] + v

In Python - sum values in dictionary the question involved the same key all the time, but in my case, the key in each row might be a new key never encountered before.

Could you help me understand, `demandlist` is, what, a list of dicts whose values somehow have rows? Can you give an example? — Ahmed Fasih, Apr 18 '18 at 00:19
Here are 3 rows of demandlist {u'2018-04-29': 1, u'2018-04-30': 1, u'2018-05-01': 1} {u'2018-04-21': 1} {u'2018-04-18': 1, u'2018-04-19': 1, u'2018-04-17' : 1} — Mark Ginsburg, Apr 18 '18 at 00:19
Got it. The fact that you're *adding* the values of duplicate keys makes me think strongly of a [reduction](https://docs.python.org/3/library/functools.html#functools.reduce), which is a general tool to express any such combinations (not just add). — Ahmed Fasih, Apr 18 '18 at 00:25
This solution is totally fine. Maybe just `demandresult[k] += v` — juanpa.arrivillaga, Apr 18 '18 at 00:26

Primusa · Accepted Answer · 2018-04-18T01:23:30.547

2

I think that your method is quite pythonic. Comprehensions are nice but they shouldn't really be overdone, and they can lead to really messy one-liners, like the one below :).

If you insist on a dict comp:

demand_list = [{u'2018-04-29': 1, u'2018-04-30': 1, u'2018-05-01': 1}, 
               {u'2018-04-21': 1},
               {u'2018-04-18': 1, u'2018-04-19': 1, u'2018-04-17' : 1}]

d = {key:sum(i[key] for i in demand_list if key in i) 
     for key in set(a for l in demand_list for a in l.keys())}

print(d)
>>>{'2018-04-21': 1, '2018-04-17': 1, '2018-04-29': 1, '2018-04-30': 1, '2018-04-19': 1, '2018-04-18': 1, '2018-05-01': 1}

edited Apr 18 '18 at 01:23

answered Apr 18 '18 at 00:24

Primusa

13,136
3
33
53

This dict comp did indeed produce the same output after processing the 494 elements in the list as the for loop in my original question. – Mark Ginsburg Apr 18 '18 at 00:42
it does but the for loop is much cleaner and should be much faster. – Primusa Apr 18 '18 at 00:43
I do like the dict comprehension—make a set of all the keys, then for each key search the list for entries with it and sum them, convoluted but cool—but yes, it's going to be slow because you're looping over the data more times than you need to (accidentally quadratic). A reasonable compromise might be user `itertools.chain`? – Ahmed Fasih Apr 18 '18 at 00:45
Take out those brackets in the function calls to `sum()` and `set()`; they force Python to go through the middle step of creating a list and then passing it to the function rather than allowing the function to just use the generator expression directly. – Apr 18 '18 at 01:22

Paul Panzer · Answer 2 · 2018-04-18T01:28:05.950

1

Here is another one-liner (ab-)using collections.ChainMap to get the combined keys:

>>> from collections import ChainMap
>>> {k: sum(d.get(k, 0) for d in demand_list) for k in ChainMap(*demand_list)}
{'2018-04-17': 1, '2018-04-21': 1, '2018-05-01': 1, '2018-04-30': 1, '2018-04-19': 1, '2018-04-29': 1, '2018-04-18': 1}

This is easily the slowest of the methods proposed here.

edited Apr 18 '18 at 01:28

answered Apr 18 '18 at 00:49

Paul Panzer

51,835
3
54
99

K.Marker · Answer 3 · 2018-04-18T01:30:11.767

0

I suppose you want to return a list of summed values of each dictionary.

list_of_dict = [
    {'a':1, 'b':2, 'c':3},
    {'d':4, 'e':5, 'f':6}
]

sum_of_each_row = [sum(v for v in d.values()) for d in list_of_dict] # [6,15]

If you want to return the total sum, just simply wrap sum() to "sum_of_each_row".

EDIT:

The main problem is that you don't have a default value for each of the keys, so you can make use of the method dict.setdefault() to set the default value when there's a new key.

list_of_dict = [
    {'a':1, 'b':1},
    {'b':1, 'c':1},
    {'a':2}
]

d = {}
d = {k:d[k]+v if k in d.keys() else d.setdefault(k,v)
    for row in list_of_dict for k,v in row.items()} # {'a':3, 'b':2, 'c':1}

edited Apr 18 '18 at 01:30

answered Apr 18 '18 at 00:32

K.Marker

129
4

From your example, my goal is to pick up the '1' value for key 'a' in row 1, and when I encounter key 'a' in a subsequent row, sum this '1' with whatever value the next occurrence of 'a' contains. So it's a key matching and summing problem. I edited the original question to make this clearer. – Mark Ginsburg Apr 18 '18 at 00:36
Totally understand your problem now. Please see my edit;) – K.Marker Apr 18 '18 at 01:35

Ahmed Fasih · Answer 4 · 2018-04-18T00:57:58.490

The only thing that seemed unclear in your code was the double-for-loop. It may be clearer to collapse the demandlist into a flat iterable—then the loopant presents the logic as simply as possible. Consider:

demandlist = [{
    u'2018-04-29': 1,
    u'2018-04-30': 1,
    u'2018-05-01': 1
}, {
    u'2018-04-21': 1
}, {
    u'2018-04-18': 1,
    u'2018-04-19': 1,
    u'2018-04-17': 1
}]

import itertools as it
from collections import defaultdict

demandresult = defaultdict(int)

for k, v in it.chain.from_iterable(map(lambda d: d.items(), demandlist)):
    demandresult[k] = demandresult[k] + v

(With this, print(demandresult) prints defaultdict(<class 'int'>, {'2018-04-29': 1, '2018-04-30': 1, '2018-05-01': 1, '2018-04-21': 1, '2018-04-18': 1, '2018-04-19': 1, '2018-04-17': 1}).)

Imagining myself reading this for the first time (or a few months later), I can see myself thinking, "Ok, I'm collapsing demandlist into a key-val iterable, I don't particularly care how, and then summing values of matching keys."

It's unfortunate that I need that map there to ensure the final iterable has key-val pairs… it.chain.from_iterable(demandlist) is a key-only iterable, so I need to call items on each dict.

Note that unlike many of the answers proposed, this implementation (like yours!) minimizes the number of scans over the data to just one—performance win (and I try to pick up as many easy performance wins as I can).

How to sum a list of dicts

4 Answers4