1

So I'm taking an intro computer science course right now, and I was wondering how to check if there were any duplicates within multiple lists. I've read up on these answers:

How can I compare two lists in python and return matches and How to find common elements in list of lists?

However, they're not quite what I'm looking for. Say for example I have this list of lists:

list_x = [[66,76], 
          [25,26,27], 
          [65,66,67,68], 
          [40,41,42,43,44], 
          [11,21,31,41,51,61]]

There are two sets of duplicates (66 and 41), although that doesn't really matter to me. Is there a way to find if the duplicates exist? What I'm looking for is if there are duplicates, the function will return True (or False, depending on what I want to do with the lists). I get the impression that I should use sets (which we have not learned about so I looked up on the internet), use for loops, or write my own function. If it's the case that I'll need to write my own function, please let me know, and I'll edit with an attempt later today!

Community
  • 1
  • 1
  • I had a similar question a while back. See if this helps: http://stackoverflow.com/questions/19300096/comparing-contents-of-2-lists-of-lists – Minas Abovyan Apr 07 '14 at 21:07
  • Your wording is a little ambiguous to me. Do you want to get the numbers that are duplicated (i.e. a list of the duplicate items such as `[66, 41]`) or just see if any duplicates exist (i.e. a boolean value such as `True`)? –  Apr 07 '14 at 21:17
  • @user3440123 What if a single list contains repeated items. Is that considered duplicate too? Or just want to check whether duplicates exist accross different lists. – Ashwini Chaudhary Apr 07 '14 at 21:17
  • iCodez, I'm trying to get a boolean value, I'll edit that! And also, I'm going on the assumption that there are no duplicates within a single list – user3440123 Apr 07 '14 at 21:19

3 Answers3

3

A very simple solution would be to use a list comprehension to first flatten the list and then afterwards use set and len together to test for any duplicates:

>>> list_x = [[66,76],
...           [25,26,27],
...           [65,66,67,68],
...           [40,41,42,43,44],
...           [11,21,31,41,51,61]]
>>> flat = [y for x in list_x for y in x]
>>> flat # Just to demonstrate
[66, 76, 25, 26, 27, 65, 66, 67, 68, 40, 41, 42, 43, 44, 11, 21, 31, 41, 51, 61]
>>> len(flat) != len(set(flat)) # True because there are duplicates
True
>>>
>>> # This list has no duplicates...
... list_x = [[1, 2],
...           [3, 4, 5],
...           [6, 7, 8, 9],
...           [10, 11, 12, 13],
...           [14, 15, 16, 17, 18]]
>>> flat = [y for x in list_x for y in x]
>>> len(flat) != len(set(flat)) # ...so this is False
False
>>>

Be warned however that this approach will be somewhat slow if list_x is large. If performance is a concern, then you can use a lazy approach which utilizes a generator expression, any, and set.add:

>>> list_x = [[66,76],
...           [25,26,27],
...           [65,66,67,68],
...           [40,41,42,43,44],
...           [11,21,31,41,51,61]]
>>> seen = set()
>>> any(y in seen or seen.add(y) for x in list_x for y in x)
True
>>>
1

Iterate and use a set to detect if there are duplicates:

seen = set()
dupes = [i for lst in list_x for i in lst if i in seen or seen.add(i)]

This makes use of the fact that seen.add() returns None. A set is a unordered collection of unique values; the i in seen test is True if i is already part of the set.

Demo:

>>> list_x = [[66,76], 
...           [25,26,27], 
...           [65,66,67,68], 
...           [40,41,42,43,44], 
...           [11,21,31,41,51,61]]
>>> seen = set()
>>> [i for lst in list_x for i in lst if i in seen or seen.add(i)]
[66, 41]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Maybe it would be better to do it in loop, to make it easier to understand? Besides, side effects in comprehension are ugly... ;) – m.wasowski Apr 07 '14 at 21:12
  • Could you explain what's happening with set()? I've read about it a bit, but I don't think I completely understand what it does – user3440123 Apr 07 '14 at 21:13
  • @user3440123: It's an unordered collection of unique values; if `i` is already in the set `seen`, the test `i in seen` returns `True`; it'll do so very efficiently (it won't need to loop through all values in the set to test each and every one). – Martijn Pieters Apr 07 '14 at 21:21
0

Here is more straightforward solution with sets:

list_x = [[66,76], 
          [25,26,27], 
          [65,66,67,68], 
          [40,41,42,43,44], 
          [11,21,31,41,51,61]]
seen = set()
duplicated = set()
for lst in list_x:
    numbers = set(lst) # only unique
    # make intersection with seen and add to duplicated:
    duplicated |= numbers & seen 
    # add numbers to seen
    seen |= numbers

print duplicated

for information about set and its operations,see docs: https://docs.python.org/2/library/stdtypes.html#set

m.wasowski
  • 6,329
  • 1
  • 23
  • 30
  • I've never seen the operator |=. What does it do? And thank you for the documentation, right now our professor is only teaching from a manual called Think Python. – user3440123 Apr 07 '14 at 21:33
  • `a |= b` is roughly the same as `a = a | b`, in case of sets it adds elements from `b` to `a`. It is in documentation! – m.wasowski Apr 07 '14 at 21:35