1

I have a large numpy array, with each row containing a dict of words, in a similar format to below:

data = [{'a': 1, 'c': 2}, {'ba': 3, 'a': 4}, ... }

Could someone please point me in the right direction for how would I go about computing the sum of all the unique values of the dicts in each row of the numpy array? From the example above, I would hope to obtain something like this:

result = {'a': 5, 'c': 2, 'ba': 3, ...}

At the moment, the only way I can think to do it is iterating through each row of the data, and then each key of the dict, if a unique key is found then append it to the new dict and set the value, if a key that's already contained in the dict is found then add the value of that key to the key in the 'result'. Although this seems like an inefficient way to do it.

CSCPNX
  • 28
  • 7
  • 1
    You might want to check out some of the ideas in here: https://stackoverflow.com/questions/16458340/python-equivalent-of-zip-for-dictionaries – Pablo Oliva Nov 21 '17 at 19:41
  • 1
    Would it be possible to use [`Counter`s](https://docs.python.org/3/library/collections.html#collections.Counter) instead? Then this could be as simple as `sum(data, Counter())` – Patrick Haugh Nov 21 '17 at 19:43
  • That looks like a list, not a numpy array. If it is indeed an array, why are you using an array to hold dicts? – Mad Physicist Nov 21 '17 at 19:47

2 Answers2

3

You could use a Counter() and update it with each dictionary contained in data, in a loop:

from collections import Counter

data = [{'a': 1, 'c': 2}, {'ba': 3, 'a': 4}]
c = Counter()
for d in data:
    c.update(d)

output:

Counter({'a': 5, 'ba': 3, 'c': 2})

alternate one liner:

(as proposed by @AntonVBR in the comments)

sum((Counter(dict(x)) for x in data), Counter())
Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
  • Can be written in one line like this: `sum((Counter(dict(x)) for x in data),Counter())` according to https://stackoverflow.com/questions/11290092/python-elegantly-merge-dictionaries-with-sum-of-values – Anton vBR Nov 21 '17 at 19:52
2

A pure Python solution using for-loops:

data = [{'a': 1, 'c': 2}, {'ba': 3, 'a': 4}]
result = {}
for d in data:
    for k, v in d.items():
        if k in result:
            result[k] += v
        else:
            result[k] = v

output:

{'c': 2, 'a': 5, 'ba': 3}
Joe Iddon
  • 20,101
  • 7
  • 33
  • 54