how to efficiently merge 3 or more dictionaries

Question

The most efficient way to merge to dictionaries seems to be with dict unpacking:

def merge_dicts(x, y)
    return {**x, **y}

But what if you have more than two dicts? What if you don't know how many dicts you will have?

The following is not supported and will produce a syntax error (See PEP-448 section Variations):

def merge_dicts(*dicts):
    return {**d for d in dicts}

This alternative is about as readable and gives pretty good performance:

def merge_dicts_comprehension(*dicts):
    return {k: v for d in dicts for k, v in d.items()}

But it's still faster to use unpacking. E.g. if you know len(dicts)==5, this would be faster:

def merge_dicts_unpack(*dicts):
    return {
        **dicts[0],
        **dicts[1],
        **dicts[2],
        **dicts[3],
        **dicts[4],
    }

But how to do dynamically?

Here is one way, but it is hideous:

def merge_dicts_dynamic(*dicts):
    num = len(dicts)
    func_name = f"_merge_{num}"
    try:
        func = globals()[func_name]
    except KeyError:
        spread = ', '.join([f"**d[{x}]" for x in range(num)])
        func_text = f'def {func_name}(*d):\n    '
        func_text += f'return {{{spread}}}'
        foo_code = compile(func_text, "<string>", "exec")
        func = FunctionType(foo_code.co_consts[0], globals(), func_name)
        globals()[func_name] = func
    return func(*dicts)

It beats the {k: v for d in dicts for k, v in d.items()} approach. But again, hideous.

How can we achieve performance equivalent to unpacking without generating function dynamically?

Other variations:

def merge_dicts_unpack_iter(*dicts):
    val = {}
    for d in dicts:
        val = {**val, **d}
    return val

Answerer variations:

From @jizhihaoSAMA:

def merge_dicts_reduce(*dicts):
    return reduce(lambda x, y: {**x, **y}, dicts)

From @RonSerruya:

def merge_dicts_update(*dicts):
    d = dict()
    for x in dicts:
        d.update(x)
    return d

Test data:

test_dicts = [{str(x): x for x in range(num, num + 20)} for num in [20, 40, 80, 100, 120]]

Test code:

for test_func in [
    'comprehension',
    'unpack_iter',
    'dynamic',
    'reduce',
    'update',
    'unpack',
]:
    result = timeit.timeit(
        stmt=f'merge_dicts_{test_func}(*test_dicts)',
        number=1000000,
        globals=globals(),
    )
    print(f"{test_func}: {result}")

Results:

comprehension: 7.904960700000629
unpack_iter: 7.860937651999848
dynamic: 3.4844029209998553
reduce: 8.41494683899964
update: 3.8839738759998
unpack: 3.184907333999945

Conclusion:

I agree with Ron that the update version provides best compromise of readability and performance.

It does not perform as well as static unpack or dynamic unpack, but it's pretty close.

What should happen to duplicate keys? – Abhijit Sarkar Nov 16 '20 at 08:44 — Abhijit Sarkar, Nov 16 '20 at 08:44

Ron Serruya · Accepted Answer · 2020-11-16T08:56:27.230


In [1]: d1 = {x:1 for x in range(100)}

In [2]: d2 = {x:1 for x in range(200,300)}

In [3]: d3 = {x:1 for x in range(350,500)}

In [4]: d4 = {x:1 for x in range(400,500)}

In [5]: d5 = {x:1 for x in range(600,720)}

In [9]: def merge_dicts(dicts):
   ...:     return {**dicts[0],
   ...:     **dicts[1],
   ...:     **dicts[2],
   ...:     **dicts[3],
   ...:     **dicts[4]}
   ...:

In [10]: %timeit merge_dicts(dicts)
12.7 µs ± 53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [11]: def merge_update(dicts):
    ...:     d = dict()
    ...:     for x in dicts:
    ...:         d.update(x)
    ...:     return d
    ...:

In [12]: %timeit merge_update(dicts)
13.3 µs ± 52.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: def merge_items(dicts):
    ...:     return {k: v for d in dicts for k, v in d.items()}
    ...:

In [14]: %timeit merge_items(dicts)
32.9 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The second method with the .update looks good enough

@chorbs Edited the comment to add the data – Ron Serruya Nov 16 '20 at 08:56 — Ron Serruya, Nov 16 '20 at 08:56

score 1 · Answer 2 · answered Nov 16 '20 at 08:53

1

Possible solution with reduce:

from functools import reduce

def merge_dict(*dicts):
    return reduce(lambda x, y: {**x, **y}, dicts)

answered Nov 16 '20 at 08:53

jizhihaoSAMA

12,336
9
27
49

how to efficiently merge 3 or more dictionaries

2 Answers2