3

I have a list that contains list of tuples as follows.

mylist = [['xxx', 879], ['yyy', 315], ['xxx', 879], ['zzz', 171], ['yyy', 315]]

I want to remove the duplicate tuples from mylist and get an output as follows.

mylist = [['xxx', 879], ['yyy', 315], ['zzz', 171]]

It seems like set in python does not work for it.

mylist = list(set(mylist))

Is there any fast and easy way of doing this in python (perhaps using libraries)?

cs95
  • 379,657
  • 97
  • 704
  • 746
J Cena
  • 963
  • 2
  • 11
  • 25
  • 4
    Possible duplicate of [How do you remove duplicates from a list in whilst preserving order?](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-whilst-preserving-order) – jdehesa Jan 17 '18 at 11:54
  • Or if you don't need to preserve order check out [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists). – jdehesa Jan 17 '18 at 11:54
  • 1
    I don't believe the question is a duplicate of that specific Q&A, though I'd guess there is a better one out there... – cs95 Jan 17 '18 at 11:58
  • 1
    the reason its not working for you is, you have a list of list , and a list cannot be added to a set because lists are not hashable . – John Joseph Fernandes Jan 17 '18 at 11:59

4 Answers4

6

It seems like you want to preserve order. In that case you can keep a set that keeps track of what lists have been added.

Here is an example:

mylist = [['xxx', 879], ['yyy', 315], ['xxx', 879], ['zzz', 171], ['yyy', 315]]

# set that keeps track of what elements have been added
seen = set()

no_dups = []
for lst in mylist:

    # convert to hashable type
    current = tuple(lst)

    # If element not in seen, add it to both
    if current not in seen:
        no_dups.append(lst)
        seen.add(current)

print(no_dups)

Which Outputs:

[['xxx', 879], ['yyy', 315], ['zzz', 171]]

Note: Since lists are not hashable, you can add tuples instead to the seen set.

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
6

The reason that you're not able to do this is because you have a list of lists and not a list of tuples.

What you could do is:

mytuplelist = [tuple(item) for item in mylist]
mylist = list(set(mytuplelist))

or

mylist = list(set(map(tuple, mylist)))
5

You need to write code that keeps the first of the sub-lists, dropping the rest. The simplest way to do this is to reverse mylist, load it into an dict object, and retrieve its key-value pairs as lists again.

>>> list(map(list, dict(mylist).items()))

Or, using a list comprehension -

>>> [list(v) for v in dict(mylist).items()]

[['zzz', 171], ['yyy', 315], ['xxx', 879]]

Note, that this answer does not maintain order! Also, if your sub-lists can have more than 2 elements, an approach involving hashing the tuplized versions of your data, as @JohnJosephFernandez' answer shows, would be the best thing to do.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Can you explain the logic behind the reversals? Also I think this fails for something like `mylist = [['xxx', 879], ['xxx', 200]]` – Chris_Rands Jan 17 '18 at 12:19
  • @Chris_Rands Sorry, they're part of an older solution I should have removed. They do nothing there. – cs95 Jan 17 '18 at 12:20
  • @Chris_Rands I have to confess that I did misread the question at first, thinking that the key (first sublist item) was the same, and OP wanted the first, dropping all the other duplicates. Because of that, I reversed the list and sent the entries into a dict, so that, when retrieving back, the last key-value pairs that were inserted, overwriting the previous, were the first pairs in the original list. I hope I made sense! – cs95 Jan 17 '18 at 12:23
  • Right well I arrived late so haven't followed the evolution of the question, but fact remains that `[list(v) for v in dict([['xxx', 879], ['xxx', 200]]).items()]` is not `list(set(tuple(item) for item in [['xxx', 879], ['xxx', 200]]))` and I think the latter (like John Joseph wrote) is what is wanted. But the OP accepted your answer so I may be wrong! Perhaps this situation never arises in their data anyway – Chris_Rands Jan 17 '18 at 12:28
  • 1
    @Chris_Rands Yup, I was rather surprised myself, I'm not afraid to admit I made a meal of answering! Well, OPs are fickle beasts, I've made the necessary edits and disclaimers, I hope that does for now. ;) – cs95 Jan 17 '18 at 12:32
2

Another option:

>>> mylist = [['xxx', 879], ['yyy', 315], ['xxx', 879], ['zzz', 171], ['yyy', 315]]
>>> y = []
>>> for x in mylist:
...     if not x in y:
...             y+=[x]
...
>>> y
[['xxx', 879], ['yyy', 315], ['zzz', 171]]
Jonathon McMurray
  • 2,881
  • 1
  • 10
  • 22