Python - Subtract a list of dicts from another

Question

I'm interested in comparing multiple lists, taking the difference and iterating through that.

Both are list of dicts that contain the following keys: 'ssid' - str, 'bssid' - str, 'channel' - int, 'flags' - list, 'found' - bool

I've tried:

 list = list(set(networks_list).difference(missing_networks))

But I receive the error:

unhashable type 'dict'

My data structure looks like this:

list: [{'found': False, 'flags': ['WPA2-PSK-CCMP', 'WPS', 'ESS'], 'ssid': 'SOHO_BROADCAST', 'bssid': '30:46:9a:9d:11:1a', 'channel': 1}, {'found': False, 'flags': ['WPA-EAP-TKIP', 'WPA2-EAP-CCMP', 'ESS'], 'ssid': 'Cisco 2.4ghz', 'bssid': '40:f4:ec:7f:3c:5a', 'channel': 11}, {'found': False, 'flags': ['WPA-EAP-TKIP', 'WPA2-EAP-CCMP', 'ESS'], 'ssid': 'Cisco 5.0ghz', 'bssid': '40:f4:ec:7f:3c:54', 'channel': 149}]

Missing networks is initially empty.

Is there a simple way of doing this?

What exactly is your data structure? Do you have a `dict` of `list`s or a `list` of `dicts`? — Joel Cornett, Aug 23 '12 at 22:15
Easy way to reproduce: `set([dict(),dict()])` -- What does this mean about the requirements for keys to `set` (or `dict`)? It's covered in the manual as to why dictionaries are not hashable by default :) — , Aug 23 '12 at 22:21
@ParthG "Why does that code throw the exception?" It does so because dictionaries are not hashable. Why not? What does `set` require of the values it contains? That they are hashable. This is the same requirement as keys in a dictionary. — , Aug 23 '12 at 22:23
I'd think you'd want to create your set() from the .keys() or the .values() or the key/value pairs (.items()) of your dictionaries. Then you can use .difference() on that. — Jim Dennis, Aug 23 '12 at 22:48
When posting questions, please make it clear what object types or data structures are being used in the example code! — Jim Dennis, Aug 25 '12 at 21:09
Do you just want to compare by keys, or does the value matter, too? — kojiro, Aug 30 '12 at 00:46
You could try promoting your dictionaries to class instances. I believe classes are hashable. — Eric, Aug 30 '12 at 00:50

Doboy · Answer 1 · 2012-08-25T05:55:33.527

4

Instead of making them list of dicts make them a list of objects which implement __eq__ and __hash__ and the code you provide should work

edited Aug 25 '12 at 05:55

answered Aug 23 '12 at 22:21

Doboy

10,411
11
40
48

Look at the answers here: http://stackoverflow.com/questions/4005318/how-to-implement-a-good-hash-function-in-python – Blender Aug 23 '12 at 23:05

Adam Wagner · Accepted Answer · 2012-08-23T22:43:20.567

4

There are probably many pitfalls to a generic approach like this, but if your dictionaries are of mostly primitives, and not huge, you can do something like this:

Assuming your data looks something like this:

networks = [
        {'address': '192.168.1.1'},
        {'address': '127.0.0.1'},
    ]

missing = [
        {'address': '127.0.0.1'}
    ]

You can turn the lists of dictionaries into lists tuples (which are hashable)

def make_hashable(d):
    return (frozenset(x.iteritems()) for x in d)

networks_hashable = make_hashable(networks)
missing_hashable = make_hashable(missing)

Then subtract

diff = set(networks_hashable).difference(missing_hashable)

Now you have a list of tuples

print list(diff)

or, convert back to dictionaries

print [dict(x) for x in diff]

Update

I've changed the definition of make_hashable based on @gnibbler's comment.

edited Aug 23 '12 at 22:43

answered Aug 23 '12 at 22:24

Adam Wagner

15,469
7
52
66

`tuple(x.iteritems()` won't work reliably because dicts are unordered – John La Rooy Aug 23 '12 at 22:36
@gnibbler This dfn should work better, or am I missing something? `def make_hashable(d): return (frozenset(x.iteritems()) for x in d)` – Adam Wagner Aug 23 '12 at 22:40
I stil get the error `'dict' object has no attribute 'items'` – Parth Aug 23 '12 at 23:17
I meant I stil get the error `'list' object has no attribute 'items'` – Parth Aug 24 '12 at 19:28
@ParthG I'm not sure I can help you without more of your code... I'm not using `items` anywhere in my code... are you using python 3 or 2 (my examples are in 2)... I'm assuming your value for `networks` or `missing` is structured differently than mine, the code is expecting there to be a dictionary where there is a list. – Adam Wagner Aug 24 '12 at 20:16
I added the data structure with sample data – Parth Aug 28 '12 at 14:12

John La Rooy · Answer 3 · 2012-08-23T22:35:03.010

No, in general it's quite difficult to do efficiently. You don't have to solve the general case though, just for your particular data structure which you haven't elaborated to us.

For example, if your dict keys are all intor str it's considerably easier than if the keys are complex numbers etc.

EDIT: Since you've now told us your data structure, I can tell you that a simple way is to convert the dicts to nametuples.

Note: You can't just convert the dict to a tuple with tuple(dict.items()) because the order of the keys can be different from one dict to the next

>>> d = dict(ssid="ssid", bssid="bssid", channel=1, flags="flags", found="True")
>>> networks_list = [d, ]
>>> from collections import namedtuple
>>> NT = namedtuple("my_struct", d.keys())
>>> set(NT(**i) for i in networks_list)
set([my_struct(found='True', flags='flags', channel=1, bssid='bssid', ssid='ssid')])

score 2 · Answer 4 · answered Aug 23 '12 at 22:23

2

A dict is a mutable item. This means it has no constant hash value over the course of its life, and cannot be put into a set.

If you convert all the dicts to strings with the same function, they become hashable and you can use them in a set...

answered Aug 23 '12 at 22:23

Roland Smith

42,427
3
64
94

It isn't true that "a [mutable object] cannot be put into a set", although the mutability is why that dictionaries -- by design choice -- are not hashable by default. The issue isn't that an object *is mutable*, just that it *is mutated while in the set*: Python shifts the burden in this case and takes the easy way out :) – Aug 23 '12 at 22:25
converting the dictionaries to strings can lead to inconsistent behavior, because they have no ordering guarantees – Adam Wagner Aug 23 '12 at 22:25
@AdamWagner: if you're going to put them in a set (for the difference thing) ordering shouldn't matter because a set is unordered anyway. – Roland Smith Aug 23 '12 at 22:32
1

It will matter because `str({'a': 'b', 'c': 'd'})` is not deterministic. If it looks like two different strings, it'll end up giving a false answer. eg: `["{'a': 'b', 'c': 'd'}"] != ["{'c': 'd', 'a': 'b'}"]`... I hope this makes sense (not a lot of room to explain here) – Adam Wagner Aug 23 '12 at 22:38
I only bring it up because I've been bitten by this in the past. – Adam Wagner Aug 23 '12 at 22:38
1

@AdamWagner: That's why I said to use a function for it (that puts the data in order :) – Roland Smith Aug 23 '12 at 22:41
@RolandSmith gotcha, didn't see that. – Adam Wagner Aug 23 '12 at 22:44

score 2 · Answer 5 · answered Aug 23 '12 at 22:55

2

What if you try something as simple as:

 lst = list(set(networks_list.items()).difference(set(missing_networks.items())))

(BTW: I've changed your variable named to lst here; binding some results to the name "list" is probably a bad idea given that Python supports a list() function. It's not a keyword, so it won't throw an exception, but you might trip over it later when you write some code that tries to call the list() function later).

answered Aug 23 '12 at 22:55

Jim Dennis

17,054
13
68
116

when doing that, I get the error `'list' object has no attribute 'items' ` – Parth Aug 23 '12 at 23:15
Seriously!? Then one of those is a list. The point we're trying to make is that you need to call the methods that are supported by your objects (types). .difference() is a set method. .items() is a method on dictionaries. If missing_networks is a list then don't call "items()" on it ... just pass it as an argument to your set() function (constructor/converter). – Jim Dennis Aug 24 '12 at 23:49
I've added the actual data structure to my original post – Parth Aug 28 '12 at 14:12

sberry · Answer 6 · 2012-08-24T06:34:11.817

This approach works:

>>> import random
>>> items = [{'ssid': 'foo%s' % i, 'bssid': 'bar%s' % i, 'channel': i, 'flags': 'abc%s' % i, 'found': random.choice([True, False])} for i in range(1, 11)]
>>> items1 = random.sample(items, 7)
>>> items2 = random.sample(items, 5)
>>> print "\n".join(map(str, items1))
{'found': True, 'flags': 'abc9', 'ssid': 'foo9', 'bssid': 'bar9', 'channel': 9}
{'found': True, 'flags': 'abc7', 'ssid': 'foo7', 'bssid': 'bar7', 'channel': 7}
{'found': False, 'flags': 'abc10', 'ssid': 'foo10', 'bssid': 'bar10', 'channel': 10}
{'found': True, 'flags': 'abc5', 'ssid': 'foo5', 'bssid': 'bar5', 'channel': 5}
{'found': False, 'flags': 'abc4', 'ssid': 'foo4', 'bssid': 'bar4', 'channel': 4}
{'found': True, 'flags': 'abc3', 'ssid': 'foo3', 'bssid': 'bar3', 'channel': 3}
{'found': True, 'flags': 'abc2', 'ssid': 'foo2', 'bssid': 'bar2', 'channel': 2}
>>> print "\n".join(map(str, items2))
{'found': True, 'flags': 'abc3', 'ssid': 'foo3', 'bssid': 'bar3', 'channel': 3}
{'found': True, 'flags': 'abc9', 'ssid': 'foo9', 'bssid': 'bar9', 'channel': 9}
{'found': False, 'flags': 'abc1', 'ssid': 'foo1', 'bssid': 'bar1', 'channel': 1}
{'found': False, 'flags': 'abc8', 'ssid': 'foo8', 'bssid': 'bar8', 'channel': 8}
{'found': True, 'flags': 'abc5', 'ssid': 'foo5', 'bssid': 'bar5', 'channel': 5}
>>> print "\n".join(map(str, [dict(itemset) for itemset in set([tuple(sorted(grp.items())) for grp in items1]).difference([tuple(sorted(grp.items())) for grp in items2])]))
{'found': False, 'flags': 'abc10', 'ssid': 'foo10', 'bssid': 'bar10', 'channel': 10}
{'found': False, 'flags': 'abc4', 'ssid': 'foo4', 'bssid': 'bar4', 'channel': 4}
{'found': True, 'flags': 'abc7', 'ssid': 'foo7', 'bssid': 'bar7', 'channel': 7}
{'found': True, 'flags': 'abc2', 'ssid': 'foo2', 'bssid': 'bar2', 'channel': 2}

You are concealing a nasty bug. dicts are unordered, so dicts that are equal can be converted to different tuples. See Adam's answer http://stackoverflow.com/a/12100842/174728. — John La Rooy, Aug 24 '12 at 06:01
But 2 dicts with the same keys will have the `.keys()` appear in the same order regardless of their contents, correct? If that is the case, then multiple dicts with the same keys should create tuples in the same key order from `.items()`. — sberry, Aug 24 '12 at 06:12
no, the keys are unordered, you can't rely on the order. Try `{0:0, None:None}.keys() == {None:None, 0:0}.keys()` — John La Rooy, Aug 24 '12 at 06:16
Evaluates to `False` for me (Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 ). Goes to show how sneaky and hard to find it could be if the bug gets buried in production code — John La Rooy, Aug 24 '12 at 06:30

score 1 · Answer 7 · answered Aug 23 '12 at 22:40

1

Use a list comprehension:

>>> l1 = [{1:1, 'a':2},{1:2, 'a':4},{1:5, 'a':'2'}]
>>> l2 = [{1:1, 'a':3},{1:2, 'a':4},{1:5, 'a':'t'}]
>>> l3 = [i for i in l1 if i not in l2]
>>> l3
[{'a': 2, 1: 1}, {'a': '2', 1: 5}]

answered Aug 23 '12 at 22:40

Brenden Brown

3,125
1
14
15

This is O(n*n), so will be very slow as the lists get longer – John La Rooy Aug 23 '12 at 22:42

vsh · Answer 8 · 2012-08-31T16:01:08.843

As mentioned earlier, dicts are mutable and therefor can't be operated by set() -- that's because there's no guarantee that once placed inside a set, a dict won't change and become equal to another existing element of the set, thus violating the set quality.

If you are only checking the dicts for equality, you can convert them to tuples, then use tuples in set() operations, then convert tuples in the resulting set back into dicts.

>>> d = {1:1, 2:2}
>>> t = tuple(d1.items())
>>> t
((1, 1), (2, 2))
>>> d_ = dict(t)
>>> d_
{1: 1, 2: 2}
>>> d == d_
True

Wrapping dicts into classes can be quite a bit more cumbersome, as you still have to solve to conversion from a dict to a immutable data type.

As you have lists inside your dicts, you have more work. The simplest is if you could just replace lists with tuples in the original dicts.

Assuming that's not feasible, your conversion process will have to be a function, as opposed to just calling tuple() and dict(), respectively. You'll need to conver lists to tuples first, then converts dicts with tuples instead of lists to tuples. For example:

>>> d = {'int1': 1, 'int2': 2, 'list1': ['a', 'b'], 'list2': ['x', 'y']}
>>> d_l = {}
>>> for key, value in d.iteritems():
...   if type(value) == list:
...     d_l[key] = tuple(value)
...   else:
...     d_l[key] = value
>>> d_l
{'int1': 1, 'int2': 2, 'list1': ('a', 'b'), 'list2': ('x', 'y')}
>>> d_ = tuple(d_l.iteritems())
>>> d_
(('int1', 1), ('int2', 2), ('list1', ('a', 'b')), ('list2', ('x', 'y')))

To convert back, you have two options. Either look at key values that you know correspond to lists (if your keys are known and don't change) or look at tuples where the second element is a tuple itself (of you don't store any tuples in the original dicts). If neither option applies, you have to write more complex conversion algorithms.

I tried this, but one of the items in my dict is a list and I get the error "unhashable type: list" — Parth, Aug 30 '12 at 14:43

score 0 · Answer 9 · answered Aug 30 '12 at 01:25

I'm going to go by Eric's answer here.

First, the problem at hand. Why is a dict unhashable? Simply put, because it's a mutable container. If you change the content of the dict, the hash changes. The same will happen for any other mutable container like lists. So, you have to use something that's immutable.

The simplest solution in my mind would be to use a wrapper class. Essentially, a class that has a single property being the dict you originally wanted. You can spice it up with whatever magic functions you want for comparisons.

So, if I have your original list of networks

network_list = [
{'found': False, 'flags': ['WPA2-PSK-CCMP', 'WPS', 'ESS'], 'ssid': 'SOHO_BROADCAST', 'bssid': '30:46:9a:9d:11:1a', 'channel': 1},
{'found': False, 'flags': ['WPA-EAP-TKIP', 'WPA2-EAP-CCMP', 'ESS'], 'ssid': 'Cisco 2.4ghz', 'bssid': '40:f4:ec:7f:3c:5a', 'channel': 11},
{'found': False, 'flags': ['WPA-EAP-TKIP', 'WPA2-EAP-CCMP', 'ESS'], 'ssid': 'Cisco 5.0ghz', 'bssid': '40:f4:ec:7f:3c:54', 'channel': 149}
]

I can easily apply a wrapper class.

class Wrapper(object):
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)

wrapped_networks = [Wrapper(**{'net_dict': network}) for network in network_list]

That way, the dictionary is stored and accessible through

wrapped_networks[0].net_dict # etc...

or whatever else you could possibly want to name it. Also, because of the way the class is implemented, you can use it to wrap whatever you want, even if there are multiple things per Wrapper!

What this does, as you might be well aware, is make is so what's actually getting hashed is the object, according to a unique id assigned to it at runtime. A little refactoring for your difference functions to work with these wrappers, and you should be well on your way (unless you come up with a better solution =D )

Python - Subtract a list of dicts from another

9 Answers9

Linked

Related