Removing duplicates from nested list based on first 2 elements

Question

I'm trying to remove duplicates from a nested list only if the first 2 elements are the same, ignoring the third.

List:

L = [['el1','el2','value1'], ['el3','el4','value2'], ['el1','el2','value2'], ['el1','el5','value3']]

Would return:

L = [['el3','el4','value2'], ['el1','el2','value2'], ['el1','el5','value3']]

I found a simple way to do similar here:

dict((x[0], x) for x in L).values()

but this only works for the first element and not the first 2, but that is exactly what I want otherwise.

score 4 · Accepted Answer · edited Jan 08 '22 at 17:37

4

If the order doesn't matter, you can use that same method but using a tuple of the first and second elements as the key:

{(x[0], x[1]): x for x in L}.values()

Or on Python versions older than 2.7:

dict(((x[0], x[1]), x) for x in L).values()

Instead of (x[0], x[1]) you can use tuple(x[:2]), use whichever you find more readable.

edited Jan 08 '22 at 17:37

mkrieger1

answered Oct 15 '12 at 19:52

Andrew Clark

Martijn Pieters · Answer 2 · 2018-05-07T15:32:18.190

4

If order matters, use a set with only the first two elements of your nested lists:

seen = set()
seen_add = seen.add
return [x for x in seq if tuple(x[:2]) not in seen and not seen_add(tuple(x[:2]))]

or you could use a collections.OrderedDict() object to keep the order; keep the x[:2] slices as keys (as tuples), and extract the values:

from collections import OrderedDict(

return OrderedDict((tuple(x[:2]), x) for x in seq).values()

In Python 3.6 and up, the standard dict type happens to retain insertion order too:

return list({tuple(x[:2]): x for x in seq}.values())

The list() call is needed to convert the dictionary view object to a list.

edited May 07 '18 at 15:32

answered Oct 15 '12 at 19:53

Martijn Pieters

I guess this is a good solution if you _really_ need it to be fast, but in most cases this is just hard to read and unpythonic. Say no to comprehensions with side effects. – Aran-Fey May 07 '18 at 14:12
@Aran-Fey: perhaps, but this specific pattern [used to be the fastest method for this specific use case](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order/480227#480227). If you need the performance, pragmatism beats purity. – Martijn Pieters May 07 '18 at 15:34

score 2 · Answer 3 · answered Oct 15 '12 at 19:53

2

this should do it:

In [55]: dict((tuple(x[:2]), x) for x in L).values()
Out[55]: [['el1', 'el2', 'value2'], ['el1', 'el5', 'value3'], ['el3', 'el4', 'value2']]

answered Oct 15 '12 at 19:53

Ashwini Chaudhary

3 Answers3