103

I have a dictionary below, and I want to add to another dictionary with not necessarily distinct elements and merge it's results. Is there any built-in function for this, or will I need to make my own?

{
  '6d6e7bf221ae24e07ab90bba4452267b05db7824cd3fd1ea94b2c9a8': 6,
  '7c4a462a6ed4a3070b6d78d97c90ac230330603d24a58cafa79caf42': 7,
  '9c37bdc9f4750dd7ee2b558d6c06400c921f4d74aabd02ed5b4ddb38': 9,
  'd3abb28d5776aef6b728920b5d7ff86fa3a71521a06538d2ad59375a': 15,
  '2ca9e1f9cbcd76a5ce1772f9b59995fd32cbcffa8a3b01b5c9c8afc2': 11
}

The number of elements in the dictionary is also unknown.

Where the merge considers two identical keys, the values of these keys should be summed instead of overwritten.

Clemens Tolboom
  • 1,872
  • 18
  • 30
badc0re
  • 3,333
  • 6
  • 30
  • 46

12 Answers12

239

You didn't say how exactly you want to merge, so take your pick:

x = {'both1': 1, 'both2': 2, 'only_x': 100}
y = {'both1': 10, 'both2': 20, 'only_y': 200}

print {k: x.get(k, 0) + y.get(k, 0) for k in set(x)}
print {k: x.get(k, 0) + y.get(k, 0) for k in set(x) & set(y)}
print {k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y)}

Results:

{'both2': 22, 'only_x': 100, 'both1': 11}
{'both2': 22, 'both1': 11}
{'only_y': 200, 'both2': 22, 'both1': 11, 'only_x': 100}
georg
  • 211,518
  • 52
  • 313
  • 390
  • 5
    how do we implement this if we have n number of dictionaries ? – Tony Mathew Sep 23 '18 at 18:57
  • I liked this approach. However in my case, for the same above dictionary values, I am trying to take the difference. i.e `x-y`. `diff= { k: x.get(k, 0) - y.get(k, 0) for k in set(x) | set(y) }` `print(diff)` And this gives me : `{'only_y': -200, 'both2': -18, 'only_x': 100, 'both1': -9}` I am concerned about the `only_y` value above, as it changed to negative `200` instead of retaining `200`. Even though you already answered the actual question, could you please suggest the better way of catching the negative values for the keys that are unique? – Panchu Sep 29 '18 at 22:29
  • @Panchu: how about `sub = lambda a, b: a if b is None else b if a is None else a -b` and then `{k: sub(x.get(k), y.get(k)) for ... etc` – georg Sep 30 '18 at 00:51
  • @georg I was using two `for in` loops to accomplish the same thing so your third option is perfect for my needs because it sums all matching keys and still keeps the non-matching keys. When we do away with loops what are these type of expressions called in python? Just a kind of mapping? Thanks. – Edison May 11 '20 at 11:49
  • @tymac: those are [dict comprehensions](https://www.python.org/dev/peps/pep-0274/) – georg May 12 '20 at 07:02
91

You can perform +, -, &, and | (intersection and union) with collections.Counter().

We can do the following (Note: only positive count values will remain in the dictionary):

from collections import Counter

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

z = dict(Counter(x)+Counter(y))

print(z)
[out]:
{'both2': 22, 'only_x': 100, 'both1': 11, 'only_y': 200}

To address adding values where the result may be zero or negative, use Counter.update() for addition, and Counter.subtract() for subtraction:

x = {'both1':0, 'both2':2, 'only_x': 100 }
y = {'both1':0, 'both2': -20, 'only_y':200 }
xx = Counter(x)
yy = Counter(y)
xx.update(yy)
dict(xx)
[out]:
{'both2': -18, 'only_x': 100, 'both1': 0, 'only_y': 200}
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Scott
  • 6,089
  • 4
  • 34
  • 51
34

Additional notes based on the answers of georg, NPE, Scott and Havok.

I was trying to perform this action on collections of 2 or more dictionaries and was interested in seeing the time it took for each. Because I wanted to do this on any number of dictionaries, I had to change some of the answers a bit. If anyone has better suggestions for them, feel free to edit.

Here's my test method. I've updated it recently to include tests with MUCH larger dictionaries, and again to include Havok's and Scott's newer methods:

Firstly I used the following data:

import random

x = {'xy1': 1, 'xy2': 2, 'xyz': 3, 'only_x': 100}
y = {'xy1': 10, 'xy2': 20, 'xyz': 30, 'only_y': 200}
z = {'xyz': 300, 'only_z': 300}

small_tests = [x, y, z]

# 200,000 random 8 letter keys
keys = [''.join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(8)) for _ in range(200000)]

a, b, c = {}, {}, {}

# 50/50 chance of a value being assigned to each dictionary, some keys will be missed but meh
for key in keys:
    if random.getrandbits(1):
        a[key] = random.randint(0, 1000)
    if random.getrandbits(1):
        b[key] = random.randint(0, 1000)
    if random.getrandbits(1):
        c[key] = random.randint(0, 1000)

large_tests = [a, b, c]

print("a:", len(a), "b:", len(b), "c:", len(c))
#: a: 100069 b: 100385 c: 99989

Now each of the methods:

from collections import defaultdict, Counter
from functools import reduce

def georg_method(tests):
    return {k: sum(t.get(k, 0) for t in tests) for k in set.union(*[set(t) for t in tests])}

def georg_method_nosum(tests):
    # If you know you will have exactly 3 dicts
    return {k: tests[0].get(k, 0) + tests[1].get(k, 0) + tests[2].get(k, 0) for k in set.union(*[set(t) for t in tests])}

def npe_method(tests):
    ret = defaultdict(int)
    for d in tests:
        for k, v in d.items():
            ret[k] += v
    return dict(ret)

# Note: There is a bug with scott's method. See below for details.
# Scott included a similar version using counters that is fixed
# See the scott_update_method below
def scott_method(tests):
    return dict(sum((Counter(t) for t in tests), Counter()))

def scott_method_nosum(tests):
    # If you know you will have exactly 3 dicts
    return dict(Counter(tests[0]) + Counter(tests[1]) + Counter(tests[2]))

def scott_update_method(tests):
    ret = Counter()
    for test in tests:
        ret.update(test)
    return dict(ret)

def scott_update_method_static(tests):
    # If you know you will have exactly 3 dicts
    xx = Counter(tests[0])
    yy = Counter(tests[1])
    zz = Counter(tests[2])
    xx.update(yy)
    xx.update(zz)
    return dict(xx)

def havok_method(tests):
    def reducer(accumulator, element):
        for key, value in element.items():
            accumulator[key] = accumulator.get(key, 0) + value
        return accumulator
    return reduce(reducer, tests, {})

methods = {
    "georg_method": georg_method, "georg_method_nosum": georg_method_nosum,
    "npe_method": npe_method,
    "scott_method": scott_method, "scott_method_nosum": scott_method_nosum,
    "scott_update_method": scott_update_method, "scott_update_method_static": scott_update_method_static,
    "havok_method": havok_method
}

I also wrote a quick function find whatever differences there were between the lists. Unfortunately, that's when I found the problem in Scott's method, namely, if you have dictionaries that total to 0, the dictionary won't be included at all because of how Counter() behaves when adding.

Test setup:

  • MacBook Pro (15-inch, Late 2016), 2.9 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3 RAM, running macOS Mojave Version 10.14.5
  • Python 3.6.5 via IPython 6.1.0

Finally, the results:

Results: Small Tests

for name, method in methods.items():
    print("Method:", name)
    %timeit -n10000 method(small_tests)
#: Method: georg_method
#: 7.81 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: georg_method_nosum
#: 4.6 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: npe_method
#: 3.2 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method
#: 24.9 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method_nosum
#: 18.9 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method
#: 9.1 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method_static
#: 14.4 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: havok_method
#: 3.09 µs ± 47.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Results: Large Tests

Naturally, couldn't run anywhere near as many loops

for name, method in methods.items():
    print("Method:", name)
    %timeit -n10 method(large_tests)
#: Method: georg_method
#: 347 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: georg_method_nosum
#: 280 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: npe_method
#: 119 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method
#: 324 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method_nosum
#: 289 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method
#: 123 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method_static
#: 136 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: havok_method
#: 103 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Conclusion

╔═══════════════════════════╦═══════╦═════════════════════════════╗
║                           ║       ║    Best of Time Per Loop    ║
║         Algorithm         ║  By   ╠══════════════╦══════════════╣
║                           ║       ║  small_tests ║  large_tests ║
╠═══════════════════════════╬═══════╬══════════════╬══════════════╣
║ functools reduce          ║ Havok ║       3.1 µs ║   103,000 µs ║
║ defaultdict sum           ║ NPE   ║       3.2 µs ║   119,000 µs ║
║ Counter().update loop     ║ Scott ║       9.1 µs ║   123,000 µs ║
║ Counter().update static   ║ Scott ║      14.4 µs ║   136,000 µs ║
║ set unions without sum()  ║ georg ║       4.6 µs ║   280,000 µs ║
║ set unions with sum()     ║ georg ║       7.8 µs ║   347,000 µs ║
║ Counter() without sum()   ║ Scott ║      18.9 µs ║   289,000 µs ║
║ Counter() with sum()      ║ Scott ║      24.9 µs ║   324,000 µs ║
╚═══════════════════════════╩═══════╩══════════════╩══════════════╝

Important. YMMV.

Hotschke
  • 9,402
  • 6
  • 46
  • 53
SCB
  • 5,821
  • 1
  • 34
  • 43
23

You could use defaultdict for this:

from collections import defaultdict

def dsum(*dicts):
    ret = defaultdict(int)
    for d in dicts:
        for k, v in d.items():
            ret[k] += v
    return dict(ret)

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

print(dsum(x, y))

This produces

{'both1': 11, 'both2': 22, 'only_x': 100, 'only_y': 200}
NPE
  • 486,780
  • 108
  • 951
  • 1,012
22

Another options using a reduce function. This allows to sum-merge an arbitrary collection of dictionaries:

from functools import reduce

collection = [
    {'a': 1, 'b': 1},
    {'a': 2, 'b': 2},
    {'a': 3, 'b': 3},
    {'a': 4, 'b': 4, 'c': 1},
    {'a': 5, 'b': 5, 'c': 1},
    {'a': 6, 'b': 6, 'c': 1},
    {'a': 7, 'b': 7},
    {'a': 8, 'b': 8},
    {'a': 9, 'b': 9},
]


def reducer(accumulator, element):
    for key, value in element.items():
        accumulator[key] = accumulator.get(key, 0) + value
    return accumulator


total = reduce(reducer, collection, {})


assert total['a'] == sum(d.get('a', 0) for d in collection)
assert total['b'] == sum(d.get('b', 0) for d in collection)
assert total['c'] == sum(d.get('c', 0) for d in collection)

print(total)

Execution:

{'a': 45, 'b': 45, 'c': 3}

Advantages:

  • Simple, clear, Pythonic.
  • Schema-less, as long all keys are "sumable".
  • O(n) temporal complexity and O(1) memory complexity.
Havok
  • 5,776
  • 1
  • 35
  • 44
2
class dict_merge(dict):
def __add__(self, other):
    result = dict_merge({})
    for key in self.keys():
        if key in other.keys():
            result[key] = self[key] + other[key]
        else:
            result[key] = self[key]
    for key in other.keys():
        if key in self.keys():
            pass
        else:
            result[key] = other[key]
    return result


a = dict_merge({"a":2, "b":3, "d":4})
b = dict_merge({"a":1, "b":2})
c = dict_merge({"a":5, "b":6, "c":5})
d = dict_merge({"a":8, "b":6, "e":5})

print((a + b + c +d))


>>> {'a': 16, 'b': 17, 'd': 4, 'c': 5, 'e': 5}

That is operator overloading. Using __add__, we have defined how to use the operator + for our dict_merge which inherits from the inbuilt python dict. You can go ahead and make it more flexible using a similar way to define other operators in this same class e.g. * with __mul__ for multiplying, or / with __div__ for dividing, or even % with __mod__ for modulo, and replacing the + in self[key] + other[key] with the corresponding operator, if you ever find yourself needing such merging. I have only tested this as it is without other operators but I don't foresee a problem with other operators. Just learn by trying.

John Mutuma
  • 3,150
  • 2
  • 18
  • 31
1
d1 = {'apples': 2, 'banana': 1}
d2 = {'apples': 3, 'banana': 2}
merged = reduce(
    lambda d, i: (
        d.update(((i[0], d.get(i[0], 0) + i[1]),)) or d
    ),
    d2.iteritems(),
    d1.copy(),
)

There is also pretty simple replacement of dict.update():

merged = dict(d1, **d2)
renskiy
  • 1,330
  • 1
  • 13
  • 12
0

A fairly simple approach:

from collections import Counter
from functools import reduce

data = [
  {'x': 10, 'y': 1, 'z': 100},
  {'x': 20, 'y': 2, 'z': 200},
  {'a': 10, 'z': 300}
]

result = dict(reduce(lambda x, y: Counter(x) + Counter(y), data))
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 04 '22 at 22:12
0

TL;DR;

This code works for both the list of dicts and pandas series (when dicts are row items). It is super fast.


@Havok method is far the best method according to my tests, since some other tests also confirm this, I am not going to put test results here, but instead, I share my code in addition to Havok's method. So, the following code works for a list of dictionaries and also for the pandas series where each row has a dictionary.

from functools import reduce
def reducer(accumulator, element):
    """Set unions two dictionary keys, and sums their values if keys are same,
    see explanation here https://stackoverflow.com/a/46128481/2234161"""
    for key, value in element.items():
        if accumulator.get(key, 0)!=0 and not accumulator.get(key, 0):
            print("why not", accumulator.get(key, 0))
        elif not value:
            print("why not value",value)
        accumulator[key] = accumulator.get(key, 0) + value
    return accumulator

def sum_dicts(dicts_collection, init_dict = None):
    """
    For a given a collection of dictionaries, it sums values of the same keys
    :param dicts_collection: [list of dictonaries, it can be a pandas series where each column has a dictionary]
    :param init_dict: [if there is a initial dictionary where the dicts_collection will be added on], defaults to dict()
    """
    res=None
    if not init_dict:
        init_dict = dict()
    try:
        res = reduce(reducer, dicts_collection, init_dict)
    except Exception as ex:
        print(f"Error while reducing dict: {dicts_collection}", ex)
        raise ex
    return res



result_dict = sum_dicts(list_of_dicts_or_pandasSeries)
Memin
  • 3,788
  • 30
  • 31
0

Create two dictionaries with random int values

several columns have the same names
import random
import pandas as pd

def create_random_dict(txt):
    my_dict = {}
    for c in txt:
        my_dict[c] = random.randint(1,30219)
    return my_dict

dict1 = create_random_dict('abcdefg')
dict2 = create_random_dict('cxzdywuf')
print(dict1)
print(dict2)
your printing results can differ due to random

{'a': 21804, 'b': 19749, 'c': 16837, 'd': 10134, 'e': 26181, 'f': 8343, 'g': 10268}
{'z': 12763, 'x': 23583, 'c': 20710, 'd': 22395, 'y': 25782, 'f': 23376, 'w': 25857, 'u': 9154}

Collect all the keys of both dictionaries

cols = list(dict1.keys())+list(dict2.keys())

Remove duplicates from column names

cols = list(dict.fromkeys(cols))

Create data frames corresponding to the dictionaries

df1 = pd.DataFrame(dict1, columns=cols, index=[0]).fillna(0)
df2 = pd.DataFrame(dict2, columns=cols, index=[0]).fillna(0)

Sum data frames and transform them back into the dictionary

result = (df1+df2).T.to_dict()[0]
print(result)

{'a': 21804, 'b': 19749, 'c': 37547, 'd': 32529, 'e': 26181, 'f': 31719, 'g': 10268, 'z': 12763, 'x': 23583, 'y': 25782, 'w': 25857, 'u': 9154}

Dima
  • 1
  • 2
-1

If you want to create a new dict as | use:

>>> dict({'a': 1,'c': 2}, **{'c': 1})
{'a': 1, 'c': 1}
Bartosz Foder
  • 15
  • 1
  • 1
-1

The approach of Scott using collections.Counter is nice, but it has the disadvantage of not being usable with sum; also the need of taking care of negative or zero values is a bit counterintuitive to me, when you simply want to add values component-wise.

So I think, it might be a good idea to write a custom class for this. This has also been the idea of John Mutuma. However, I want to add my solution:

I create a class which behaves very much like a dict, passing basically all member calls to the underlying _data in the getatrr method. The only two things which are different, is:

  1. it has a DEFAULT_VALUE (similar to collections.defaultdict) which is used as value for non-existing keys.
  2. it implements an __add__() method which (together the __radd__() method) takes care of adding the dicts component-wise.
from typing import Union, Any


class AddableDict:
    DEFAULT_VALUE = 0

    def __init__(self, data: dict) -> None:
        self._data = data

    def __getattr__(self, attr: str) -> Any:
        return getattr(self._data, attr)

    def __getitem__(self, item) -> Any:
        try:
            return self._data[item]
        except KeyError:
            return self.DEFAULT_VALUE

    def __repr__(self):
        return self._data.__repr__()

    def __add__(self, other) -> "AddableDict":
        return AddableDict({
            key: self[key] + other[key]
            for key in set(self.keys()) | set(other.keys())
        })

    def __radd__(
        self, other: Union[int, "AddableDict"]
    ) -> "AddableDict":
        if other == 0:
            return self

That way we can add two those objects and sum iterables of those objects as well:

>>> alpha = AddableDict({"a": 1})
>>> beta = AddableDict({"a": 10, "b": 5})
>>> alpha + beta
{'a': 11, 'b': 5}

>>> sum([beta]*10)
{'a': 100, 'b': 50}

To my eye, this solution has the advantage of providing a simple and understandable interface for the developer to use. Of course, you can also inherit from dict instead of using composition.

Jonathan Scholbach
  • 4,925
  • 3
  • 23
  • 44