4

I would like to find an easy and/or fast way to find all common couple (pair: value) given N dictionaries in python. (3.X would be best)

PROBLEM

Given a set of 3 dicts (but it could be any dict, it is just for the example)

n1 = {'a': 1, 'b': 2, 'c': 3}
n2 = {'a': 1, 'b': 4, 'c': 3, 'd': 4}
n3 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

The result for common (key: values) for n1, n2 and n3 should be:

({'a': 1, 'c': 3})

And for n2 and n3 it should be

({'a': 1, 'c': 3, 'd': 4})

I first though about using a brute force algorithm that will check every pair (key: value) for every dict

Here is a implementation using a recursive algorithm

SOLUTION A

list_dict = [n1, n2, n3]

def finding_uniquness(ls):

    def recursion(ls, result):
        if not ls:
            return result
        result = {k: v  for k, v in result.items()  for k1, v1 in ls[0].items() if k == k1 and v == v1}
        return recursion(ls[1:], result)

    return recursion(ls[1:], ls[0])


finding_uniquness(list_dict)
# {'c': 3, 'a': 1}

But it is not easily understandable and the complexity is high (I'm not sure how to calculate complexity; but since we compare all the elements on all dict, the complexity should be O(N²)?)

Then, I though about Sets. because it could naturally compare all the element

SOLUTION B

import functools

list_dict = [n1, n2, n3]
set_list = [set(n.items()) for n in list_dict]

functools.reduce(lambda x, y: x & y, set_list)
 # {('a', 1), ('c', 3)}

It is so much better than the previous solution, unfortunately, when one of the key have a list as values it throws an error:

>>> n = {'a': [], 'b': 2, 'c': 3}
>>> set(n.items()) 

TypeError: unhashable type: 'list'

My question is then double:

  • is there any better algorithm than SOLUTION A?
  • or is there a way to avoid the TypeError with SOLUTION B?

of course, any other remarks will be welcome.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Kruupös
  • 5,097
  • 3
  • 27
  • 43
  • 1
    You say "unique" several times but actually it seems you're looking for the _common_ key:value pair? In other words, the one that all dictionaries share. – roganjosh Mar 01 '17 at 20:14
  • 1
    which types can you have in your dictionaries? flat lists? lists of lists? because you could convert to `tuple` beforehand so the hash issue is gone. – Jean-François Fabre Mar 01 '17 at 20:15
  • @roganjosh yes, sorry, common is what I am looking for, I'd be pleased if you edit my questions. (or I will do it) – Kruupös Mar 01 '17 at 20:16
  • @Jean-FrançoisFabre it is more of an abstract question, so eventually I could have _any_ types in my dictionnaries. Included nested `list`, `dict` etc... – Kruupös Mar 01 '17 at 20:18
  • A crude way of doing this would be to change Solution B to `set_list = [set(map(str, n.items())) for n in list_dict]` but then you'd probably need an OrderedDict. – roganjosh Mar 01 '17 at 20:28
  • 1
    @roganjosh that would make `6` and `"6"` equal for instance. Maybe safer with `repr` – Jean-François Fabre Mar 01 '17 at 20:29
  • 1
    **Pro tip**: In python3 `.items()`, `.values()` and `.keys()` already return set-like objects,hence there's no need to do `set(d.items())`! – Bakuriu Mar 01 '17 at 20:31
  • @Bakuriu `values` isn't set-like and `items` isn't always set-like. – Stefan Pochmann Mar 01 '17 at 20:38
  • 1
    BTW good question. shows research, and generate some nice answers (that's a sign :)) – Jean-François Fabre Mar 01 '17 at 20:39
  • @Jean-FrançoisFabre thx :) – Kruupös Mar 01 '17 at 20:41
  • @Bakuriu, I was thinking exactly the same but when I do `set(n) & set(n1)` (when `n`and `n1`) are `dict` it make the comparaison only over the `key` and not the values. Maybe I misanderstood what you were saying though – Kruupös Mar 01 '17 at 20:49

3 Answers3

7

Simpler and more efficient way:

>>> {k: v
     for k, v in list_dict[0].items()
     if all(k in d and d[k] == v
            for d in list_dict[1:])}
{'c': 3, 'a': 1}

Using an extra variable for list_dict[1:] might be beneficial, otherwise the short-circuiting of all somewhat goes to waste. Or if you don't need the list afterwards you could just pop the "master" dictionary:

>>> {k: v
     for k, v in list_dict.pop().items()
     if all(k in d and d[k] == v
            for d in list_dict)}
{'c': 3, 'a': 1}

Or using get with a default that cannot be in the dictionary as suggested by @Jean-FrançoisFabre:

>>> marker = object()
>>> {k: v
         for k, v in list_dict.pop().items()
         if all(d.get(k, marker) == v
                for d in list_dict)}
{'c': 3, 'a': 1}
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • that one I understand when I'm reading the code! (even at this late hour) – Jean-François Fabre Mar 01 '17 at 20:40
  • nitpick: `k in d and d[k] == v` could be `d.get(k,marker) == v` where `marker=object()`, to handle the case where keys could be `None`. That would probably be even faster. – Jean-François Fabre Mar 01 '17 at 20:41
  • @Jean-FrançoisFabre Ah, yeah, I thought about `get` but rejected it because of `None` and forgot about the trick you mention. – Stefan Pochmann Mar 01 '17 at 20:49
  • nice. edited a little bit more for clarity. Read this marker trick on SO a couple of weeks ago. – Jean-François Fabre Mar 01 '17 at 20:55
  • @Jean-FrançoisFabre do you happen to remember the question that raised it? I understand how it works in this context but I'm interested in whether the question/answer elaborates on it more. – roganjosh Mar 01 '17 at 21:30
  • since I remembered `marker` as a name, it took me at least 3 seconds to find it :) http://stackoverflow.com/questions/41811832/pass-class-attribute-as-parameter-of-a-function-of-that-class-in-python/41811883#41811883. It didn't use a dictionary and `get` though, so I made that up myself (hoping to reuse it sometime in an answer). Basically, if you instanciate the object at this point, it cannot be in the dict (mutability helps :)) – Jean-François Fabre Mar 01 '17 at 21:32
3

If unhashable values are a problem you can always compute the intersection of the keys up-front by using .keys() and then compare only the values associated with the keys that all dictionaries have in common:

import operator as op
from functools import reduce

common_keys = reduce(op.and_, (d.keys() for d in my_dicts))
common_items = {}
for key in common_keys:
    value = my_dicts[0][key]
    if all(d[key] == value for d in my_dicts):
        common_items[key] = value

This should be pretty faster than solution a, slower than solution b, but works on all inputs.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
1

A batteries-included version.

To handle unhashable types, we use pickling; replace it with dill or json or any other predictable serialization to taste.

import collections
import itertools
import pickle  

def findCommonPairs(dicts):
    all_pairs = itertools.chain(*[d.items() for d in dicts])
    cnt = collections.Counter(map(pickle.dumps, all_pairs))
    return [pickle.loads(pickled_pair)
            for pickled_pair, count in cnt.items()
            if count == len(dicts)]


>>> findCommonPairs([n1, n2, n3])
[('a', 1), ('c', 3)]

>>> findCommonPairs([{'a': [1,2], 'b': [2,3]}, {'a': [1,2]}])
[('a', [1, 2])]

Note that serialization only goes so far. To properly compare dicts if dicts, for instance, these dicts must be turned into (key, value) pairs and sorted before serialization. Any structures that reference each other may have issues (or not). Replace pickling with a custom predictable serializer if you care about these issues.

9000
  • 39,899
  • 9
  • 66
  • 104