5

I'm trying to remove duplicates from a nested list only if the first 2 elements are the same, ignoring the third.

List:

L = [['el1','el2','value1'], ['el3','el4','value2'], ['el1','el2','value2'], ['el1','el5','value3']]

Would return:

L = [['el3','el4','value2'], ['el1','el2','value2'], ['el1','el5','value3']]

I found a simple way to do similar here:

dict((x[0], x) for x in L).values()

but this only works for the first element and not the first 2, but that is exactly what I want otherwise.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
john
  • 1,280
  • 1
  • 18
  • 48

3 Answers3

4

If the order doesn't matter, you can use that same method but using a tuple of the first and second elements as the key:

{(x[0], x[1]): x for x in L}.values()

Or on Python versions older than 2.7:

dict(((x[0], x[1]), x) for x in L).values()

Instead of (x[0], x[1]) you can use tuple(x[:2]), use whichever you find more readable.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
4

If order matters, use a set with only the first two elements of your nested lists:

seen = set()
seen_add = seen.add
return [x for x in seq if tuple(x[:2]) not in seen and not seen_add(tuple(x[:2]))]

or you could use a collections.OrderedDict() object to keep the order; keep the x[:2] slices as keys (as tuples), and extract the values:

from collections import OrderedDict(

return OrderedDict((tuple(x[:2]), x) for x in seq).values()

In Python 3.6 and up, the standard dict type happens to retain insertion order too:

return list({tuple(x[:2]): x for x in seq}.values())

The list() call is needed to convert the dictionary view object to a list.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I guess this is a good solution if you _really_ need it to be fast, but in most cases this is just hard to read and unpythonic. Say no to comprehensions with side effects. – Aran-Fey May 07 '18 at 14:12
  • @Aran-Fey: perhaps, but this specific pattern [used to be the fastest method for this specific use case](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order/480227#480227). If you need the performance, pragmatism beats purity. – Martijn Pieters May 07 '18 at 15:34
2

this should do it:

In [55]: dict((tuple(x[:2]), x) for x in L).values()
Out[55]: [['el1', 'el2', 'value2'], ['el1', 'el5', 'value3'], ['el3', 'el4', 'value2']]
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504