Find the difference between 2 lists in Python

Question

How do you find the difference between 2 dicts of objects by comparing one of the object's attributes?

In this example, 2 objects are said to equal each other if their phone attribute are the same.

a1 = {'name':'Tom', 'phone':'1234'}
a2 = {'name':'Dick', 'phone':'1111'}
a3 = {'name':'Harry', 'phone':'3333'}
a = [a1,a2,a3]

b1 = {'name':'Jane', 'phone':'1234'}
b2 = {'name':'Liz', 'phone':'2222'}
b3 = {'name':'Mary', 'phone':'4444'}
b = [b1,b2,b3]

def check(x, y):
    if(x['phone'] == y['phone']):
        return True
    else:
        return False

The desired results should be:

result_A_minus_B = [a2, a3]
result_B_minus_A = [b2, b3]

My attempt here below throws an error TypeError: list indices must be integers, not str

[x for x in a if check(a,b)]

This is not a duplicate of the linked question. That question is about calculating the differences between successive elements of one list, while this question is about excluding elements from one list if a corresponding element appears anywhere else in a second list. — Peter DeGlopper, Nov 30 '13 at 22:27

Peter DeGlopper · Answer 1 · 2013-11-30T07:53:46.283

With the data structures as given, you'd have to repeatedly iterate through the items in your second list of dictionaries, which is relatively inefficient. All you care about is whether a given phone number already exists in the second list of dictionaries. The most efficient data structure for repeatedly testing whether or not a given value is present is a set (or a dict if you might need to index from phone numbers back to further information). So I would do this as the following:

a = [a1, a2, a3]
b = [b1, b2, b3]
a_phone_numbers_set = set(d['phone'] for d in a])
b_phone_numbers_set = set(d['phone'] for d in b])
result_A_minus_B = [d for d in a if d['phone'] not in b_phone_numbers_set]
result_B_minus_A = [d for d in b if d['phone'] not in a_phone_numbers_set]

Or, if I wanted to create a function:

def unmatched_entries(list1, list2):
    existing_entries = set(d['phone'] for d in list2)
    return [d for d in list1 if d['phone'] not in existing_entries]

Optionally, you could use an arbitrary key:

def unmatched_entries(list1, list2, matching_key):
    existing_entries = set(d[matching_key] for d in list2 if matching_key in d)
    return [d for d in list1 if matching_key in d and d[matching_key] not in existing_entries]

That version always skips entries from list1 that don't define the requested key - other behavior is possible.

To match on multiple keys as alluded to by a briefly appearing comment, I would use a set of tuples of the values:

a_match_elements = set((d['phone'], d['email']) for d in a])
result_B_minus_a = [d for d in b if (d['phone'], d['email']) not in a_match_elements]

Again, this could be generalized to handle a sequence of keys.

John1024 · Answer 2 · 2013-11-30T06:49:06.427

0

This function:

def minus(list1, list2):
    return [x for x in list1 if x['phone'] not in set(y['phone'] for y in list2)]

gives these results:

>>> minus(a, b)
[{'name': 'Dick', 'phone': '1111'}, {'name': 'Harry', 'phone': '3333'}]
>>> minus(b, a)
[{'name': 'Liz', 'phone': '2222'}, {'name': 'Mary', 'phone': '4444'}]

edited Nov 30 '13 at 06:49

answered Nov 30 '13 at 06:42

John1024

109,961
14
137
171

This is concise but inefficient - it's order len(list1)*len(list2). Fine for the example short lists, but not a good algorithm for larger data sets. There's no down side to collecting all phone values from `list2` outside the comprehension. – Peter DeGlopper Nov 30 '13 at 06:43

bcorso · Answer 3 · 2013-11-30T08:21:01.757

0

If you can change the data type, a dict might be better. You can use the phone number as the key to retrieve names.

a = {'1234':'Tom','1111':'Dick','3333':'Harry'}
b = {'1234':'Jane', '2222':'Liz','4444':'Mary'}

def minus(x, y):
    {z:a[z] for z in set(x.keys()) - set(y.keys())}

# {'1111':'Dick','3333':'Harry'}
a_minus_b = minus(a, b)

# {'2222':'Liz','4444':'Mary'}
b_minus_a = minus(b, a)

edited Nov 30 '13 at 08:21

answered Nov 30 '13 at 07:56

bcorso

45,608
10
63
75

That's fine for a dict that indexes by phone number, but that's not the data structure the OP is starting with. Set difference will be faster than the `if not in` implementation, but for this to work you have to create both a reverse lookup dict and a set from that dict - have you done any profiling? I'm not necessarily saying this isn't faster, just that I'd want to see some numbers. – Peter DeGlopper Nov 30 '13 at 08:27
I'm just offering an alternative, which is why I started by saying 'if you can change the data type.' I'm not saying this is the best solution, especially if he plans to have multiple people with the same number in 'a' or 'b' a dict obviously wouldnt work. But if this meets his needs it's certainly easy to use and understand. – bcorso Nov 30 '13 at 08:37
The example data structure is certainly less than ideal. – Peter DeGlopper Nov 30 '13 at 08:47

Find the difference between 2 lists in Python

3 Answers3