4

If I have a list like this one:

mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]

What is best way to remove duplicate sub-lists?

Now I use this:

y, s = [ ], set( )
for t in mylist:
    w = tuple( sorted( t ) )
    if not w in s:
        y.append( t )
        s.add( w )

It works, but I wonder if there is better way? Something more python-like?

Community
  • 1
  • 1
MarioBross
  • 97
  • 1
  • 5

7 Answers7

10

Convert the elements to a tuple*, then convert it the whole thing to a set, then convert everything back to a list:

m = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]

print [list(i) for i in set(map(tuple, m))]

*we are converting to tuples because lists are non-hashable (and therefore we cannot use set on them

Secret
  • 3,291
  • 3
  • 32
  • 50
8

You can use OrderedDict.fromkeys to filter duplicates out of the list while still preserving order:

>>> from collections import OrderedDict
>>> mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]
>>> map(list, OrderedDict.fromkeys(map(tuple, mylist)))
[[1, 2, 3], ['a', 'c'], [3, 4, 5], [1, 2]]
>>>

The map(tuple, mylist) is necessary because dictionary keys must be hashable (lists are not since you can add/remove items from them).

5

Well, since sets inherently dedupe things, your first instinct might be to do set(mylist). However, that doesn't quite work:

In [1]: mylist = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]

In [2]: set(mylist)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b352bcae5975> in <module>()
----> 1 set(mylist)

TypeError: unhashable type: 'list'

This is because sets only work on iterables of hashable elements (and since lists are mutable, they are not hashable).

Instead, you can do this simply for the price of converting your sublists to subtuples:

In [3]: set([tuple(x) for x in mylist])
Out[3]: {(1, 2), (1, 2, 3), (3, 4, 5), ('a', 'c')}

Or, if you really need a list of lists again:

In [4]: [list(x) for x in set([tuple(x) for x in mylist])]
Out[4]: [[1, 2], [3, 4, 5], ['a', 'c'], [1, 2, 3]]
  • 1
    Anyone care to explain the downvote, since it demonstrably solves OP's problem as stated? –  Feb 26 '15 at 23:57
  • 1
    I upvoted you guys to cancel out the downvote. Seriously what was that bout? – Secret Feb 26 '15 at 23:58
4

Because you have sorted(t) in your question, I assume you consider [1,2] to be a duplicate of [2,1]

If this is true, I'd use frozenset for the inside lists (which are hashable) and will not care about the ordering of the sublists.

So something like:

set(frozenset(sublist) for sublist in mylist)
ComputerDruid
  • 16,897
  • 1
  • 19
  • 28
2

You don't need to sort, the sort in the code you copied is sorting for a different reason:

seen,out = set(), []

for ele in mylist:
    tp = tuple(ele)
    if tp not in seen:
        out.append(ele)
    seen.add(tp)
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
1

Well this will work for your case:

mylist2 = set(map(tuple, mylist))
print(mylist2) # ('a', 'c'), (3, 4, 5), (1, 2), (1, 2, 3)}

This works, because it changes your sublists to tuples, which in your case are hashable. So set can take them and make a unique.

And in case you really want the output to be a list of lists, you can do this:

print(list(map(list,mylist2))) # [['a', 'c'], [3, 4, 5], [1, 2], [1, 2, 3]]
Marcin
  • 215,873
  • 14
  • 235
  • 294
1

If order and structure (list of lists) don't matter, you can use

set(map(tuple, my_list))

if they do matter, you can use a list comprehension

[e for i,e in enumerate(my_list) if e not in my_list[:i]]

which keeps only the first duplicate of every element, thus keeping only one of each. It is marginally slower

In [16]: timeit.timeit('[e for i,e in enumerate(my_list) if e not in my_list[:i]]', setup="my_list = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]")
Out[16]: 1.9146944019994407

In [17]: timeit.timeit('set(map(tuple, my_list))', setup="my_list = [[1,2,3], ['a', 'c'], [3,4,5],[1,2], [3,4,5], ['a', 'c'], [3,4,5], [1,2]]")
Out[17]: 1.3857673469974543

but if you care about speed you should probably try a loopey approach.

Evpok
  • 4,273
  • 3
  • 34
  • 46