-2

How can I compare a list of lists with itself in python in order to:

  • identify identical sublists with the same items (not necessarily in the same item order)
  • delete these duplicate sublists

Example:

list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10] ]

clean_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9] ]

Any help is greatly appreciated. I can't seem to figure this out.

martineau
  • 119,623
  • 25
  • 170
  • 301
St4rb0y
  • 317
  • 3
  • 5
  • 22
  • 2
    you should avoid built-ins names for your objects like `list` – Azat Ibrakov May 30 '17 at 14:57
  • Check out this question https://stackoverflow.com/questions/2213923/python-removing-duplicates-from-a-list-of-lists – Brandon Deo May 30 '17 at 15:03
  • @AzatIbrakov Thanks for pointing this out, although as stated above it's only a generic example, not an extract of the original script, with obscure variable names that no one would have understood out of context. – St4rb0y May 31 '17 at 15:58
  • @2ps I'm a Python beginner and didn't know where to begin to tackle the problem. I'm glad that so many people provided helpful answers that I can learn from. Thanks! – St4rb0y May 31 '17 at 15:58
  • @bdeo I must have overlooked that one. Sorry! – St4rb0y May 31 '17 at 15:58

4 Answers4

2

I would rebuild the "clean_list" in a list comprehension, checking that the sorted version of the sublist isn't already in the previous elements

the_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10] ]

clean_list = [l for i,l in enumerate(the_list) if all(sorted(l)!=sorted(the_list[j]) for j in range(0,i))]

print(clean_list)

of course, sorting the items for each iteration is time consuming, so you could prepare a sorted list of sublists:

the_sorted_list = [sorted(l) for l in the_list]

and use it:

clean_list = [the_list[i] for i,l in enumerate(the_sorted_list) if all(l!=the_sorted_list[j] for j in range(0,i))]

result (in both cases):

[[1, 3, 5, 6], [7, 8], [10, 12], [9]]

As many suggested, maybe a simple for loop (no list comprehension there) storing the already seen items in a set would be more performant for the lookup of the duplicates. That alternate solution could be necessary if the input list is really big to avoid the O(n) lookup of all.

An example of implementation could be:

test_set = set()
clean_list = []

for l in the_list:
    sl = sorted(l)
    tsl = tuple(sl)
    if not tsl in test_set:
        test_set.add(tsl)  # note it down to avoid inserting it next time
        clean_list.append(sl)
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
0

Create a set. Then for each list in the list, sort it, transform into tuple, then insert into set.

setOfLists = set()
for list in listOfLists:
    list.sort()
    setOfLists.add(tuple(list))

print setOfLists

You can retransform the tuples in the set into lists again.

Eric
  • 730
  • 4
  • 16
  • the only issue here: the `setOfLists` won't preserve the order of the sublists. otherwise, OK answer (same as what Chris Rands suggested, except that Chris suggested to keep the set as a marker, but still store the sublists in a list to preserve order) – Jean-François Fabre May 30 '17 at 15:17
  • also `list.sort() setOfLists.add(tuple(list))` => `setOfLists.add(tuple(sorted(list)))` – Jean-François Fabre May 30 '17 at 15:18
  • also: sorting the sublists may not be what the OP wants. Using an aux set with sorted items would be a good solution. – Jean-François Fabre May 30 '17 at 15:24
0

Simple for loops will work, but if your dataset is small, e.g. 1k or less, you can use this :

b = []
[b.append(i) for i in a if len([j for j in b if set(j) == set(i)])==0 ]

print b
WoooHaaaa
  • 19,732
  • 32
  • 90
  • 138
0

So heres my take on this.

I def a function that sorts each sublist and appends to a temp list. then I check if the sublist in temp_my_list is 'not' in temp_clean_list and if not then append to new list. this should work for any 2 sets of list. I added some extra list to show some kind of result other than an empty string.

my_list = [[1, 3, 5, 6], [7, 8], [10, 12], [9], [3, 1, 5, 6], [12, 10],[16]]
clean_list = [ [1, 3, 5, 6], [7, 8], [10, 12], [9],[18]]
new_list = []

def getNewList():
    temp_my_list = []
    temp_clean_list = []
    for sublist in my_list:
        sublist.sort()
        temp_my_list.append(msublist)
    for sublist in clean_list:
        sublist.sort()
        temp_clean_list.append(sublist)
    for sublist in temp_my_list:
        if sublist not in temp_clean_list:
            new_list.append(sublist)

getNewList()
print (new_list)

Resulit:

[[16]]
Mike - SMT
  • 14,784
  • 4
  • 35
  • 79