6

This question differs from similar dictionary merge questions in that conflicting duplicates should fail, or return False. Other solutions use a precedence rule to decide how to manage when one key might be mapped to two different variables.

How do I merge two dicts efficiently in python. As an example, consider:

d1 = {'x': 'a', 'y': 'b', 'z': 'c'}
d2 = {'z': 'c', 'w': 'r'}
d3 = {'z': 'd', 'w': 'r'}

so, the result of merging dictionary 1 and 2 would be

{'x': 'a', 'y': 'b', 'z': 'c', 'w': 'r'}

but the merge of 1 and 3 or 2 and 3 should fail because z has a conflict.

My solution is:

def merge_dicts(d1,d2):
   k1=d1.keys()
   k2=d2.keys()
   unified_dict=dict()
   for k in k1:
       # look up in second dictionary
      if k in k2:
         pt=d2[k]  #pt stands for 'plain text'
         # if lookup is a contradiction, return empty dictionary
         #  don't even bother with partial results
         if pt!=d1[k]:
             return dict()
         else:
             unified_dict[k]=d1[k]  # safe: key is consistent
      else:
          unified_dict[k]=d1[k] # safe:  no key in k2

# get the rest
# already resolved intersection issues so just get set difference
   for k in d2.keys():
      if k not in d1.keys():
          unified_dict[k]=d2[k]

   return unified_dict

Any improvements?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Michael Tuchman
  • 332
  • 3
  • 12

5 Answers5

6

Use dictionary views here; they let you treat dictionary keys as sets:

def merge_dicts(d1, d2):
    try:
        # Python 2
        intersection = d1.viewkeys() & d2
    except AttributeError:
        intersection = d1.keys() & d2
       
    if any(d1[shared] != d2[shared] for shared in intersection):
        return {}  # empty result if there are conflicts

    # leave the rest to C code, execute a fast merge using dict()
    return dict(d1, **d2)

The above code only tests for shared keys referencing non-matching values; the merge itself is best just left to the dict() function.

I made the function work both on Python 2 and Python 3; if you only need to support one or the other, remove the try..except and replace intersection with the relevant expression. In Python 3 the dict.keys() method returns a dictionary view by default. Also, in Python 3-only code I’d use {**d1, **d2} expansion, which is a little faster, cleaner and is not limited to string keys only.

You could conceivably make this a one-liner; Python 3 version:

def merge_dicts(d1, d2):
    return (
        {} if any(d1[k] != d2[k] for k in d1.keys() & d2)
        else {**d1, **d2}
    )

If all you need to support is Python 3.9 or newer, you can use the | dictionary merge operator:

def merge_dicts(d1, d2):
   return (
       {} if any(d1[k] != d2[k] for k in d1.keys() & d2)
       else d1 | d2
   )

Demo:

>>> d1 = {'x': 'a', 'y': 'b', 'z': 'c'}
>>> d2 = {'z': 'c', 'w': 'r'}
>>> d3 = {'z': 'd', 'w': 'r'}
>>> merge_dicts(d1, d2)
{'y': 'b', 'x': 'a', 'z': 'c', 'w': 'r'}
>>> merge_dicts(d1, d3)
{}
>>> merge_dicts(d2, d3)
{}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Since the goal of this question was to find the most Pythonic way, I appreciate the answers for both Python 2 and 3. What I need to dig into and learn here is the dict(d1,**d2) semantics. Thank you all for your help. – Michael Tuchman Jul 13 '15 at 17:29
  • 2
    As for `dict(d1, **d2)`, see [What does \*\* (double star) and \* (star) do for Python parameters?](http://stackoverflow.com/q/36901) and the [`dict()` function documentation](https://docs.python.org/2/library/functions.html#func-dict). `dict(d1)` creates a copy of `d1` only. `dict(**d2)` would create a copy of `d2`, in a round-about way. `dict(d1, **d2)` creates a copy of `d1`, adding in the key-value pairs of `d2`. – Martijn Pieters Jul 13 '15 at 17:38
3
d1 = {'x': 'a', 'y': 'b', 'z': 'c'}                                                             
d2 = {'z': 'c', 'w': 'r'}
d3 = {'z': 'd', 'w': 'r'}

def dict_merge(d1, d2):
    """docstring for merge"""
    # doesn't work with python 3.x. Use keys(), items() instead
    if len(d1.viewkeys() & d2) != len(d1.viewitems() & d2.viewitems()):
        return {}
    else:
        result = dict(d1, **d2)
        return result

if __name__ == '__main__':
    print dict_merge(d1, d2)
vHalaharvi
  • 179
  • 1
  • 10
  • 1
    This alters `d1` in place and limits the values to hashable objects only. – Martijn Pieters Jul 09 '15 at 22:33
  • 2
    Also, you can drop the `.keys()` calls as those are entirely redundant and inefficient. Add to that the fact that dictionary views are *already* set objects that support intersections without so many calls and you could rework this to a more efficient and cleaner version. – Martijn Pieters Jul 09 '15 at 22:51
  • thanks Martijn for your insightful comments, I have posted an edit – vHalaharvi Jul 10 '15 at 13:09
  • **This is certainly the fastest approach,** thanks to exclusively deferring to C-based set operations on `KeysView` and `ItemsView` containers. @MartijnPieters for the win... *yet again.* – Cecil Curry Apr 01 '21 at 06:41
  • @CecilCurry yes, which is why I also used them in my own answer. If you are using Python 3.9 or newer exclusively, also use `|` instead of `dict(d1. **d2)` for more readability, faster speed and without the limitation to string-only keys! – Martijn Pieters Apr 01 '21 at 08:00
1

a slightly different approach (pre-check):

d1={'x':'a','y':'b','z':'c'}
d2={'z':'c','w':'r'}
d3={'z':'d','w':'r'}

def merge(d1, d2):
    for (k1,v1) in d1.items():
        if k1 in d2 and v1 != d2[k1]:
            raise ValueError
    ret = d1.copy()
    ret.update(d2)
    return ret

print(merge(d1,d2))
print(merge(d1,d3))
hiro protagonist
  • 44,693
  • 14
  • 86
  • 111
1

Why not using set ?

#!/usr/bin/python

d1={'x':'a','y':'b','z':'c'}
d2={'w':'r'}
d3={'z':'d','w':'r'}
d4={'x':'a','y':'b','z':'c'}

def merge_dicts(d1, d2):
    dicts = d1.items() + d2.items()
    if len(dicts) != len(set(dicts)):
        raise ValueError
    else:
        return dict(set(dicts))
print merge_dicts(d1, d2)
print merge_dicts(d1, d3)
try:
    print merge_dicts(d1, d4)
except:
    print "Failed"

$ python foo.py
{'y': 'b', 'x': 'a', 'z': 'c', 'w': 'r'}
{'y': 'b', 'x': 'a', 'z': 'd', 'w': 'r'}
Failed

Edit:

Indeed this will not work with not hashable value, this one will:

#!/usr/bin/python
# coding: utf-8 

#!/usr/bin/python

d1={'x':'a','y':'b','z':'c'}
d2={'w':'r'}
d3={'z':'d','w':'r'}
d4={'x':'a','y':'b','z':'c'}

def merge_dicts(d1, d2):
    merged= d1.copy()
    for k, v in d2.iteritems():
        if k in merged:
            raise ValueError
        else:
            merged[k] = v 
    return merged

for one, two in [(d1, d2), (d1, d3), (d1, d4)]:
    try:
        print merge_dicts(one, two)
    except:
        print "Merge Failed for %s with %s" %(one, two)
wilfriedroset
  • 217
  • 1
  • 8
  • 1
    This requires that the *values* are hashable. You cannot use this if the dictionaries contain other dictionaries or lists or sets, for example. – Martijn Pieters Jul 09 '15 at 16:58
  • Indend, the values have to be hashable for the first one. – wilfriedroset Jul 09 '15 at 17:18
  • Your second version once again disallows shared keys with equal values. – Martijn Pieters Jul 09 '15 at 17:19
  • I didn't use set only because I wasn't sure whether the set operations would be duplicating other efforts to manage computation time. However, it would be my preference to use set operations because it would make the code more concise and expressive. – Michael Tuchman Jul 13 '15 at 17:24
1

This does what you want:

def merge_dicts(d1, d2):
    # Join dicts and get rid of non-conflicting dups
    elems = set(d1.items()) | set(d2.items())

    # Construct join dict
    res = {}
    for k, v in elems:
        if k in res.keys():
            return dict()  # conflicting dup found
        res[k] = v;

    return res
Yann
  • 567
  • 5
  • 15
  • 3
    This limits you to immutable *values*; you cannot use this with lists or sets or dictionaries in either of the dictionaries. – Martijn Pieters Jul 09 '15 at 17:43
  • 3
    Besides, In Python 3 `dict.items()` is *already set like*. In Python 2 you can use `dict.viewitems()`, with the added advantage it operates like a set *without the values having to be immutable*. – Martijn Pieters Jul 09 '15 at 17:44