Python - comparing lists of dictionaries using tuples - unexpected behaviour?

Question

I've been attempting to compare two lists of dictionaries, and to find the userid's of new people in list2 that aren't in list1. For example the first list:

list1 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}]

and the second list:

list2 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}, {"userid": "34892", "name": "daniel", "age": "64", "occupation": "chef"}]

the desired output:

newpeople = ['34892']

This is what I've managed to put together:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

newpeople = [t for t in list2tuple if t not in list1tuple]

This actually seems to be pretty efficient, especially considering the lists I am using might contain over 50,000 dictionaries. However, here's the issue:

If it finds a userid in list2 that indeed isn't in list1, it adds it to newpeople (as desired), but then also adds every other userid that comes afterwards in list2 to newpeople as well.

So, say list2 contains 600 userids and the 500th userid in list2 isn't found anywhere in list1, the first item in newpeople will be the 500th userid (again, as desired), but then followed by the other 100 userids that came after the new one.

This is pretty perplexing to me - I'd greatly appreciate anyone helping me get to the bottom of why this is happening.

list1tuple is neither a tuple nor a list, its a generator ... that's your problem — donkopotamus, Aug 15 '16 at 23:22
@donkopotamus funnily enough, I actually ran type(list1tuple) and got generator, which baffled me even more. Would you mind pointing me in the direction of how I might fix this/achieve my goal? thanks — dan martin, Aug 15 '16 at 23:23
http://stackoverflow.com/questions/3462143/get-difference-between-two-lists — Jeremy Kahan, Aug 15 '16 at 23:28

donkopotamus · Accepted Answer · 2016-08-15T23:28:23.540

Currently you have set list1tuple and list2tuple as:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

These are generators, not lists (or tuples), which means they can only be iterated over once, which is causing your problem.

You could change them to be lists:

list1tuple = [d["userid"] for d in list1]
list2tuple = [d["userid"] for d in list2]

which would allow you to iterate over them as many times as you like. But a better solution would be to simply make them sets:

list1tuple = set(d["userid"] for d in list1)
list2tuple = set(d["userid"] for d in list2)

And then take the set difference

newpeople = list2tuple - list1tuple

That makes sense, thanks! Question: is that not an additional bracket in list1tuple/list2tuple? — dan martin, Aug 15 '16 at 23:27

score 1 · Answer 2 · answered Aug 15 '16 at 23:32

As can be seen from a python console, list1tuple and list2tuple are generators:

>>> ((d["userid"]) for d in list1)
<generator object <genexpr> at 0x10a9936e0>

Although the second one can remain a generator (there is no need to expand the list), the first one should first be converted to a list, set or tuple, e.g.:

list1set = {d['userid'] for d in list1}
list2generator = (d['userid'] for d in list2)

You can now check for membership in the group:

>>> [t for t in list2generator if t not in list1set]
['34892']

Python - comparing lists of dictionaries using tuples - unexpected behaviour?

2 Answers2