39

What is the easiest way to compare the 2 lists/sets and output the differences? Are there any built in functions that will help me compare nested lists/sets?

Inputs:

First_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222],  
              ['Test3.doc', '3c3c3c', 3333]
             ]  
Secnd_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '8p8p8p', 9999], 
              ['Test4.doc', '4d4d4d', 4444]]  

Expected Output:

Differences = [['Test3.doc', '3c3c3c', 3333],
               ['Test3.doc', '8p8p8p', 9999], 
               ['Test4.doc', '4d4d4d', 4444]]
fredtantini
  • 15,966
  • 8
  • 49
  • 55
tang
  • 413
  • 1
  • 4
  • 5
  • 1
    See the set-related documentation here: https://docs.python.org/3.8/library/stdtypes.html#set-types-set-frozenset – codeforester Oct 21 '20 at 05:37

8 Answers8

44

So you want the difference between two lists of items.

first_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '3c3c3c', 3333]]
secnd_list = [['Test.doc', '1a1a1a', 1111], 
              ['Test2.doc', '2b2b2b', 2222], 
              ['Test3.doc', '8p8p8p', 9999], 
              ['Test4.doc', '4d4d4d', 4444]]

First I'd turn each list of lists into a list of tuples, so as tuples are hashable (lists are not) so you can convert your list of tuples into a set of tuples:

first_tuple_list = [tuple(lst) for lst in first_list]
secnd_tuple_list = [tuple(lst) for lst in secnd_list]

Then you can make sets:

first_set = set(first_tuple_list)
secnd_set = set(secnd_tuple_list)

EDIT (suggested by sdolan): You could have done the last two steps for each list in a one-liner:

first_set = set(map(tuple, first_list))
secnd_set = set(map(tuple, secnd_list))

Note: map is a functional programming command that applies the function in the first argument (in this case the tuple function) to each item in the second argument (which in our case is a list of lists).

and find the symmetric difference between the sets:

>>> first_set.symmetric_difference(secnd_set) 
set([('Test3.doc', '3c3c3c', 3333),
     ('Test3.doc', '8p8p8p', 9999),
     ('Test4.doc', '4d4d4d', 4444)])

Note first_set ^ secnd_set is equivalent to symmetric_difference.

Also if you don't want to use sets (e.g., using python 2.2), its quite straightforward to do. E.g., with list comprehensions:

>>> [x for x in first_list if x not in secnd_list] + [x for x in secnd_list if x not in first_list]
[['Test3.doc', '3c3c3c', 3333],
 ['Test3.doc', '8p8p8p', 9999],
 ['Test4.doc', '4d4d4d', 4444]]

or with the functional filter command and lambda functions. (You have to test both ways and combine).

>>> filter(lambda x: x not in secnd_list, first_list) + filter(lambda x: x not in first_list, secnd_list)

[['Test3.doc', '3c3c3c', 3333],
 ['Test3.doc', '8p8p8p', 9999],
 ['Test4.doc', '4d4d4d', 4444]]
dr jimbob
  • 17,259
  • 7
  • 59
  • 81
  • 5
    +1: But I think `map(tuple, first_list)` is cleaner for the tuple conversion. Also, `symmetric_difference` doesn't need a set for it's first argument, so you can skip the set conversion in `secnd_set` (though it may do just that under the covers). – Sam Dolan May 24 '11 at 05:18
  • @sdolan: I agree map is cleaner. Also could have done something like `first_set = set(map(tuple, first_list))` skipping the intermediate tuple list. But I was trying to be pedagogical as tang seemed new to python (e.g., not putting quotes in his string), and personally I think list comprehension is more readable to novices than the more functional `map`. – dr jimbob May 24 '11 at 12:27
  • Hi! If you are online, can you give me an idea how to compare list of list(if unordered), I just linked your [answer my one here](http://stackoverflow.com/questions/15855792/how-do-i-compare-2d-lists-for-equality-in-python/15855811#15855811) I am learning Python. Using `sort()` I can do but that changes original list :( .. – Grijesh Chauhan Apr 06 '13 at 21:18
3

Not sure if there is a nice function for this, but the "manual" way to do it isn't difficult:

differences = []

for list in firstList:
    if list not in secondList:
        differences.append(list)
Sam Magura
  • 850
  • 8
  • 18
  • 2
    Note that this wouldn't find lists that are in `secondList`, but not in `firstList`; though you could always just check both ways like: `[x for x in first_list if x not in secnd_list] + [x for x in secnd_list if x not in first_list]`. Also its a good habit not to use the keyword/type/function `list` as a name of a variable. Even after you are out of the for loop, you won't be able to use the list keyword. – dr jimbob May 24 '11 at 14:32
3
>>> First_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333']] 
>>> Secnd_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333'], ['Test4.doc', '4d4d4d', '4444']] 


>>> z = [tuple(y) for y in First_list]
>>> z
[('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333')]
>>> x = [tuple(y) for y in Secnd_list]
>>> x
[('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333'), ('Test4.doc', '4d4d4d', '4444')]


>>> set(x) - set(z)
set([('Test4.doc', '4d4d4d', '4444')])
pyfunc
  • 65,343
  • 15
  • 148
  • 136
  • 2
    +1 Note `set1 - set2` corresponds to difference (elements in set1 but not in set2), where I think he wanted the symmetric difference (`set1 ^ set2`) to find elements in `set1` or `set2`, but not both. As he didn't specify which set to subtract elements from. – dr jimbob May 24 '11 at 14:47
2

By using set comprehensions, you can make it a one-liner. If you want:

to get a set of tuples, then:

Differences = {tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list}

Or to get a list of tuples, then:

Differences = list({tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list})

Or to get a list of lists (if you really want), then:

Differences = [list(j) for j in {tuple(i) for i in First_list} ^ {tuple(i) for i in Secnd_list}]

PS: I read here: https://stackoverflow.com/a/10973817/4900095 that map() function is not a pythonic way to do things.

Community
  • 1
  • 1
2

Old question but here's a solution I use for returning unique elements not found in both lists.

I use this for comparing the values returned from a database and the values generated by a directory crawler package. I didn't like the other solutions I found because many of them could not dynamically handle both flat lists and nested lists.

def differentiate(x, y):
    """
    Retrieve a unique of list of elements that do not exist in both x and y.
    Capable of parsing one-dimensional (flat) and two-dimensional (lists of lists) lists.

    :param x: list #1
    :param y: list #2
    :return: list of unique values
    """
    # Validate both lists, confirm either are empty
    if len(x) == 0 and len(y) > 0:
        return y  # All y values are unique if x is empty
    elif len(y) == 0 and len(x) > 0:
        return x  # All x values are unique if y is empty

    # Get the input type to convert back to before return
    try:
        input_type = type(x[0])
    except IndexError:
        input_type = type(y[0])

    # Dealing with a 2D dataset (list of lists)
    try:
        # Immutable and Unique - Convert list of tuples into set of tuples
        first_set = set(map(tuple, x))
        secnd_set = set(map(tuple, y))

    # Dealing with a 1D dataset (list of items)
    except TypeError:
        # Unique values only
        first_set = set(x)
        secnd_set = set(y)

    # Determine which list is longest
    longest = first_set if len(first_set) > len(secnd_set) else secnd_set
    shortest = secnd_set if len(first_set) > len(secnd_set) else first_set

    # Generate set of non-shared values and return list of values in original type
    return [input_type(i) for i in {i for i in longest if i not in shortest}]
Stephen Neal
  • 141
  • 1
  • 3
1

i guess you'll have to convert your lists to sets:

>>> a = {('a', 'b'), ('c', 'd'), ('e', 'f')}
>>> b = {('a', 'b'), ('h', 'g')}
>>> a.symmetric_difference(b)
{('e', 'f'), ('h', 'g'), ('c', 'd')}
0

http://docs.python.org/library/difflib.html is a good starting place for what you are looking for.

If you apply it recursively to the deltas, you should be able to handle nested data structures. But it will take some work.

btilly
  • 43,296
  • 3
  • 59
  • 88
0

Note that with this method you will loose the order

first_set=set(map(tuple,S))
second_set=set(map(tuple,T))
print map(list,list(first_set.union(second_set)-(first_set&second_set)))
Avo Asatryan
  • 404
  • 8
  • 21