How to remove duplicates from a nested list based on the first element while prioritizing the length of the list?

Question

I have a list like this

[[7, 6, 8], [1, 10], [3, 10], [7, 8], [7, 4], [9, 4], [5, 8], [9, 8]]

And I want the output to look something like this:

[[7, 6, 8],[1, 10],[3, 10],[9, 4],[5, 8]]

Where the algorithm should remove duplicates based on the first element in the inner list eg '7','1', '3' etc. while prioritizing inner list's length, i.e. shorter should be removed first.

I found something similar here and here on how to do the first part of the question using this:

dict((x[0], x) for x in any_list).values()

but I don't know how to prioritize the length.

Do you really not care which inner list should be removed if duplicates have the same length and the order of the inner lists not important? If so, sort the outer list by length of inner list first: https://stackoverflow.com/questions/4735704/ordering-a-list-of-lists-by-lists-len and then remove duplicates as you mentioned. — StefanS, Jul 09 '16 at 13:34

score 2 · Accepted Answer · 2016-07-09T13:46:42.913

2

You can just sort your list by the length using sorted(any_list, key=len).

Your code could look like this then:

dict((x[0], x) for x in sorted(any_list, key=len)).values()

If you want to have a list in the end, simply pass the result into list().

edited Jul 09 '16 at 13:46

answered Jul 09 '16 at 13:41

score 1 · Answer 2 · answered Jul 09 '16 at 13:38

You can use collections.defaultdict() in order to categorize your lists based on first item them choose the longer one using max() function with len() as it's key:

>>> lst = [[7, 6, 8], [1, 10], [3, 10], [7, 8], [7, 4], [9, 4], [5, 8], [9, 8]]
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> 
>>> for i, *j in lst:
...     d[i].append(j)
... 
>>> d
defaultdict(<class 'list'>, {1: [[10]], 3: [[10]], 9: [[4], [8]], 5: [[8]], 7: [[6, 8], [8], [4]]})
>>> [[k] + max(v, key=len) for k, v in d.items()]
[[1, 10], [3, 10], [9, 4], [5, 8], [7, 6, 8]]

If you care about the order you can use OrdeedDict() instead:

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> 
>>> for i, *j in lst:
...     d.setdefault(i, []).append(j)
... 
>>> [[k] + max(v, key=len) for k, v in d.items()]
[[7, 6, 8], [1, 10], [3, 10], [9, 4], [5, 8]]

How to remove duplicates from a nested list based on the first element while prioritizing the length of the list?

2 Answers2