5

What will be the most efficient way to check if key-value pair of one dictionary is present in other dictionary as well.
Suppose if I have two dictionaries as dict1 and dict2 and these two dictionaries have some of the key-value pairs in common. I want to find those and print them. What would be the most efficient way to do this? Please suggest.

gliese581g
  • 433
  • 8
  • 21

4 Answers4

10

one way would be:

d_inter = dict([k, v for k, v in dict1.iteritems() if k in dict2 and dict2[k] == v])

the other:

d_inter = dict(set(d1.iteritems()).intersection(d2.iteritems()))

I'm not sure which one would be more efficient, so let's compare both of them:

1. Solution with iteration through dicts:

  • we parse all keys of dict1: for k,v in dict1.iteritems() -> O(n)
  • then we check whether the key is in dict2, if k in dict2 and dict2[k] == v -> O(m)

which makes it a global worst case complexity of O(n+m) -> O(n)

2. Solution with sets:

if we assume that converting a dict into a set is O(n):

  • we parse all items of d1 to create the first set set(d1.iteritems()) -> O(n)
  • we parse all items of d2 to create the second set set(d2.iteritems()) -> O(m)
  • we get the intersetion of both which is O(min(len(s), len(t)) on average or O(n * m) in worst case

which makes it a global worst case complexity of O(2n*n*m) which can be considered as O(n^3) for same sized dicts: then solution 1. is best

if we assume that converting a dict into a set is O(1) (constant time)

the average is O(min(n,m)) and the worst case is O(n*m), then solution #1 is best on worst case scenario, but solution #2 is best on average case scenario because O(n+m) > O(min(n,m)).

In conclusion, the solution you choose will depend on your dataset and the measurements you'll make! ;-)

N.B.: I took there the complexity of the set().

N.B.2: for the solution #1 make always the smallest dict as dict2 and for solution #2 the smallest dict as dict1.


N.B.2016: This solution was written for python2. Here's the changes needed to make it python3 ready:

  • replace iteritems() with items() ;
  • you could also use the newer dict comprehension syntax: {[k, v for … == v]} ;
  • as d.items() returns dict_items which is not hashable anymore, you'd have to use a frozenset() instead {frozenset(d1.items()).intersection(d2.items())}.
zmo
  • 24,463
  • 4
  • 54
  • 90
  • 1
    On my local tests, comparison: `2.86 us per loop` and set: `2.06 us per loop`. So `set` is better (at least for small dictionaries). +1 `set + iteritems` usage – Mp0int Feb 07 '14 at 16:04
  • @FallenAngel: Thank you for the statistics! – gliese581g Feb 07 '14 at 16:07
  • 1
    You can speed up your first approach by using `k in dict2` instead; in Python 2, `k in dict2.keys()` first creates a list and then has to scan through it. – DSM Feb 07 '14 at 16:19
  • ...which adds an unnecessary `O(m)` in the algorithm. @DSM good to know, I always though both were equivalent, and thus preferred to use the `.keys()` as "explicit is better than implicit" :-) – zmo Feb 07 '14 at 16:21
  • @zmo: This works absolutely fine. But whenever I have a key whose value is of type list, it throws an exception as : TypeError: unhashable type: 'list'. Please suggest what should I do in such situations. – gliese581g Feb 11 '14 at 08:13
  • 1
    it's because you can't use a list as a key! The lists are mutable per design, and thus it's impossible to compute a hash for key comparison. If you really need some kind of list as a key (which is a design choice you really need to think twice before implementing), you'd better use a `tuple()` instead. – zmo Feb 11 '14 at 12:35
2

What about...

matching_dict_values = {}
for key in dict1.keys():
    if key in dict2.keys():
        if dict1[key] == dict2[key]:
            matching_dict_values[key]=dict1[key]
Cam
  • 478
  • 1
  • 4
  • 13
  • which is the same thing as my solution #1, but not as a one liner. – zmo Feb 07 '14 at 16:22
  • do you know if one is more efficient than the other? – Cam Feb 07 '14 at 16:24
  • what do you mean? Yes, I do, and actually my solution #1 is better than your solution, because I'm using `.iteritems()` and I followed @DSM's advice not to use `.keys()`. cf my answer for all the details. – zmo Feb 07 '14 at 16:30
0

I don't see why you'd need anything fancier than this:

if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:

We don't have to worry about a KeyError because the boolean test will fail before the and (do a value which correlates to a key that isn't in one of them will never get tested)

So to get your full list common key-value pairs you could do this:

for testKey in set(dict1.keys() + dict2.keys()):
    if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
        commonDict[testKey] = dict1[testKey]
wnnmaw
  • 5,444
  • 3
  • 38
  • 63
  • he wants the intersection of two dicts, not whether a given key is in both dicts. So to apply your solution to his problem, you'd tell him to do a `for testKey in dict1.keys()+dict2.keys(): ...`?! – zmo Feb 07 '14 at 15:52
  • @zmo, It seemed to me that the real problem here was finding if a key-value pair is common to both, and that everything after that would be fairly easy, but I'll update to get a full list of common pairs – wnnmaw Feb 07 '14 at 15:55
  • even though your problem answers the question, it is not efficient at all, as it implies a constant `O(n*m*(n+m))` which is `O(n3)` in worst case scenario. – zmo Feb 07 '14 at 16:13
0

Update to @zmo's answer

Solution 1:

d_inter = {k:v for k, v in dict1.items() if k in dict2 and dict2[k] == v}

Solution 2:

d_inter = dict(set(dict1.items()).intersection(dict2.items()))
murat yalçın
  • 709
  • 7
  • 10